The Bagging Algorithm

Bootstrap sampling: D为总数据，N为子数据集个数，将总数据平均分为N个子数据集
Aggregating: N个子数据集训练出N个模型，根据N个模型的输出进行聚合 (voting for classification and averaging for regression)

当base learner不稳定 (large variance, low bias) 时Bagging带来的提升很多

base learner需要能够响应细微的变化，因此有时over-fit是允许的

A good choice for base learner: Decision Tree

Untitled

Random Forest:

Use decision tree as basic unit in bagging

Untitled

RF incorporate randomized feature selection at each split step

Untitled

3 Rules:

Random forest need basic learner aware the little change, sometimes overfit is allowable.
Each time, basic learner doesn’t learn from all data, but from Random bootstrap sampled data.
Basic learner doesn’t use all features, but Random select some features.

有两层随机性：数据随机、特征随机

Boosting

Untitled