[Papers] Summary

[ICLRW 2018] To prune, or not to prune: exploring the efficacy of pruning for model compression (Michael Zhu, Suyog Gupta)
– large-sparse models consistenly outperform small-dense models
– propose gradual pruning, increasing sparsity gradually plus make mask like method for TF
  . in higher variance models, layerwise constant pruning seems to mitigate the degradation from pruning. 

[CoRR abs 2017] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
– built using depthwise convolution and pointwise convolution, reducing model size
  . pointwise convolution can be directly mapped to GEMM without im2col

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
– push-the-button compression pipeline for determining pruning rate for each layer using DDPG
  . penalizing accuracy loss while encouraging model shrinking and speedup
  . actor-critic structure helps reduce variance, facilitating stabler training
– structured pruning, pruned weights are regular
– result:
  . ResNet: 1×1 have less redundancy and can be pruned less, 3×3 have more redundancy and can be pruned more

[CoRR abs 2015] Distilling the Knowledge in a Neural Network
– using a higher value for temperature (T) produces a softer probability distribution over classes. use this to train distilled model then use temperature of 1.
– if correct labels are known, modify the soft targets or use weighted average of two…
– it works well for MNIST, ASR…
– it can be used as regularizers since only soft targets of subset of images seem to prevent overfitting

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.