Though substantial advancements have been made in training deep neural networks, one problem remains, the vanishing gradient. The very strength of deep neural networks, their depth, is also unfortunately their problem, due to the difficulty of thoroughly training the deeper layers due to the vanishing gradient. This paper proposes "Phylogenetic Replay Learning", a learning methodology that substantially alleviates the vanishing gradient problem. Unlike the residual learning methods, it does not restrict the structure of the model. Instead, it leverages elements from Neuroevolution, transfer learning, and layer by layer training. We demonstrate that this new approach is able to produce a better performing model, and by calculating Shannon Entropy of weights, we show that the deeper layers are trained much more thoroughly and contain statistically significantly more information than when a model is trained in a traditional brute force manner...
Key words: Neural Networks, Neuroevolution, Phylogenetic Replay Learning, Deep Learning, Vanishing gradient
|