[Deep Learning] What I’ve learned about neural network quantization


There is support for “fake quantization operators” in TensorFlow. Including them where quantization is expected to occur will round the float values to specified number of levels to simulate quantization + Gives recalculated min/max ranges for the 32-bit to 8-bit downscaling.

For most numbers, quantizing numbers is like adding noise, but in the case of zero, this is not the case. Zero shows up a lot in neural network calculations. If zero is not represented well, these zeros will contribute disproportionately to overall result.

Not much principle, but evidence states that avoiding -128 may be helpful.

Lower bit depths are promising, but unproven

