The Bagging Algorithm
- Bootstrap sampling: D为总数据,N为子数据集个数,将总数据平均分为N个子数据集
- Aggregating: N个子数据集训练出N个模型,根据N个模型的输出进行聚合 (voting for classification and averaging for regression)
当base learner不稳定 (large variance, low bias) 时Bagging带来的提升很多
base learner需要能够响应细微的变化,因此有时over-fit是允许的
A good choice for base learner: Decision Tree

Random Forest:
Use decision tree as basic unit in bagging

RF incorporate randomized feature selection at each split step

3 Rules:
- Random forest need basic learner aware the little change, sometimes
overfit is allowable.
- Each time, basic learner doesn’t learn from all data, but from Random
bootstrap sampled data.
- Basic learner doesn’t use all features, but Random select some features.
有两层随机性:数据随机、特征随机
Boosting
