**[ICLRW 2018] To prune, or not to prune: exploring the efficacy of pruning for model compression (Michael Zhu, Suyog Gupta)**

– large-sparse models consistenly outperform small-dense models

– propose gradual pruning, increasing sparsity gradually plus make mask like method for TF

. in higher variance models, layerwise constant pruning seems to mitigate the degradation from pruning.

**[CoRR abs 2017] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications**

– built using depthwise convolution and pointwise convolution, reducing model size

. pointwise convolution can be directly mapped to GEMM without im2col

**[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices**

– push-the-button compression pipeline for determining pruning rate for each layer using DDPG

. penalizing accuracy loss while encouraging model shrinking and speedup

. actor-critic structure helps reduce variance, facilitating stabler training

– structured pruning, pruned weights are regular

– result:

. ResNet: 1×1 have less redundancy and can be pruned less, 3×3 have more redundancy and can be pruned more

**[CoRR abs 2015] Distilling the Knowledge in a Neural Network**

– using a higher value for temperature (T) produces a softer probability distribution over classes. use this to train distilled model then use temperature of 1.

– if correct labels are known, modify the soft targets or use weighted average of two…

– it works well for MNIST, ASR…

– it can be used as regularizers since only soft targets of subset of images seem to prevent overfitting

### Like this:

Like Loading...

*Related*