[Deep Learning] What I’ve learned about neural network quantization


There is support for “fake quantization operators” in TensorFlow. Including them where quantization is expected to occur will round the float values to specified number of levels to simulate quantization + Gives recalculated min/max ranges for the 32-bit to 8-bit downscaling.

For most numbers, quantizing numbers is like adding noise, but in the case of zero, this is not the case. Zero shows up a lot in neural network calculations. If zero is not represented well, these zeros will contribute disproportionately to overall result.

Not much principle, but evidence states that avoiding -128 may be helpful.

Lower bit depths are promising, but unproven

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.