[Papers] HyperNetworks

Hypernetwork refers to one network that generates weights for another network (analogous to genotype and phenotype). The author trains the network end-to-end using backpropagation.

Hypernetworks are linear, result of matrix multiplication and addition of bias. Input is z^j and number of its element is N_z. d in the above equations denote size of the hidden layer in the network.

For MNIST, model network was
conv1: 28x28x1 –> (16x7x7x1) –> 28x28x16
conv2: 14x14x16 –> (16x7x7x1) –> 14x14x16
fc: 7x7x16 –> (784×10) –> 10

Using this seems to give comparable results in MNIST and WRN with CIFAR-10 but I really would need to check…

Only on RNN in the final paper

Interestingly, the author pulled the CNN part out of the paper in the final draft through a revision… I believe this maybe because it did not give the best result…

Reference

Original paper: https://arxiv.org/pdf/1609.09106.pdf
ICLR 2017 Review: https://openreview.net/forum?id=rkpACe1lx
ICLR 2017 Final paper: https://openreview.net/pdf?id=rkpACe1lx
Author’s blog: http://blog.otoro.net/2016/09/28/hyper-networks/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.