Batch Normalization

Exploding gradient: gradient too large
Vanishing gradient: gradient too small

Untitled

$x^{(k)}$是网络中的第k层特征

Untitled

对该层进行归一化，使得所有的特征都在(0, 1)范围内

Batch normalization会限制DNN的表达性，进行修正

Untitled

E与Var均为常量，$\gamma,~\beta$为被学习的参数，会随着梯度下降进行更新

Compute:

Untitled

Untitled

Data Augmentation:

Untitled

成本最低的增加数据量的做法

Neural Network Pruning

Resources:

Storage resources: storing the parameters of neural networks