There is support for “fake quantization operators” in TensorFlow. Including them where quantization is expected to occur will round the float values to specified number of levels to simulate quantization + Gives recalculated min/max ranges for the 32-bit to 8-bit downscaling.
For most numbers, quantizing numbers is like adding noise, but in the case of zero, this is not the case. Zero shows up a lot in neural network calculations. If zero is not represented well, these zeros will contribute disproportionately to overall result.
Not much principle, but evidence states that avoiding -128 may be helpful.
Lower bit depths are promising, but unproven