Finally, I am working with ImageNet…
MNIST, and CIFAR-10 datasets give good insight to how networks can be trained and generalize over variations of each classes of data, but they are too small a dataset to offer experience of deep learning of the real world!
Dataset Input Pipeline
First, I will begin with the details of AlexNet’s pipeline. Before training and testing, images were down-sampled to 256×256 images where original images were scaled so that shorter side is 256 and center was cropped from each image. (central_crop of TensorFlow does not do this!)
In VGGNet and ResNet, both used “scale jittering” while is just using values between [256, 480] instead of fixed 256. Other than this they use principal components of the dataset to augment the data, and mean to normalize the data. However, in my case, I will use few more augmentation schemes such as random contrast, etc. I referred to the following link.
Since, this dataset is much bigger than CIFAR, I was thinking that I should use aggressive prefetching to the GPUs to do this. However, it seems to not work that well when I use with what I implemented. Related discussion is in the following link.
Some say that I should use copy_to_device, which actually seem to cause leakage or some kind of problem behind the scene, because I see major slowdown after 500 iterations when I use this. Therefore, I decided to just prefetch to CPU and leave TensorFlow scheduler to do all the work for me. (I am not really sure about this though, it actually seems like throttling due to poor cooling too… I have to look into this)
This is just to note few things that made me waste a lot of time.
First, some images in the ILSVRC 2012 are in CYMK… and this should be consistent or be dealt with later.
Second, there is not reason to waste time on calculating Eigenvalue / vector of the train set.
GPUs show non-deterministic results sometimes, and this is just because of how software is implemented. (The detail, I have no idea about)
Since I learnt a lot from my former project to train ResNet20 on CIFAR, I managed to make the network in one shot! 🙂 However, one change that had to be noted was regarding which option to use out of the ones mentioned in the paper. (A, B, or C… I will try to add information on this)
This is an interesting blog as to how certain changes affect accuracy on ResNet
Everything was perfect, except that it is taking way too long to train. One epoch is taking around an hour (maybe worse due to throttling), so 120 epochs will take 5 days. I will leave it running and see what I get.
I was originally thinking that my implementation had serious problem or something, but by multiplying some factors to timing that is written in the following link, I see that each 10 steps with mini batch size of 256 should take around 5-6s which is a little less than mine, but I will deal with this problem later. (Following link is in PyTorch but number of computation should not be that far off).
On a different note, it seems that implementing with PyTorch maybe a good project for later. PyTorch is one of the imperative frameworks so it is much easier to use.
Following is just an interesting blog about training a network on Azure. What is especially interesting is the matrix.
Apparently, many literatures get their accuracy using 10-crop testing. According to TF Slim page, this gives some better performance but they did not implement it. I think it is unnecessarily complicated to implement in TensorFlow, so I am skipping it too. It seems to account for around 1-2% accuracy.
A good blog on HW (CPU, GPU) regarding Deep Learning: http://timdettmers.com/2015/03/09/deep-learning-hardware-guide/
Running TF on CPU: https://stackoverflow.com/questions/37660312/how-to-run-tensorflow-on-cpu