Hypernetwork refers to one network that generates weights for another network (analogous to genotype and phenotype). The author trains the network end-to-end using backpropagation.
Hypernetworks are linear, result of matrix multiplication and addition of bias. Input is z^j and number of its element is N_z. d in the above equations denote size of the hidden layer in the network.
For MNIST, model network was
conv1: 28x28x1 –> (16x7x7x1) –> 28x28x16
conv2: 14x14x16 –> (16x7x7x1) –> 14x14x16
fc: 7x7x16 –> (784×10) –> 10
Using this seems to give comparable results in MNIST and WRN with CIFAR-10 but I really would need to check…
Only on RNN in the final paper
Interestingly, the author pulled the CNN part out of the paper in the final draft through a revision… I believe this maybe because it did not give the best result…
Original paper: https://arxiv.org/pdf/1609.09106.pdf
ICLR 2017 Review: https://openreview.net/forum?id=rkpACe1lx
ICLR 2017 Final paper: https://openreview.net/pdf?id=rkpACe1lx
Author’s blog: http://blog.otoro.net/2016/09/28/hyper-networks/