14,796 research outputs found
Initialization of ReLUs for Dynamical Isometry
Deep learning relies on good initialization schemes and hyperparameter
choices prior to training a neural network. Random weight initializations
induce random network ensembles, which give rise to the trainability, training
speed, and sometimes also generalization ability of an instance. In addition,
such ensembles provide theoretical insights into the space of candidate models
of which one is selected during training. The results obtained so far rely on
mean field approximations that assume infinite layer width and that study
average squared signals. We derive the joint signal output distribution
exactly, without mean field assumptions, for fully-connected networks with
Gaussian weights and biases, and analyze deviations from the mean field
results. For rectified linear units, we further discuss limitations of the
standard initialization scheme, such as its lack of dynamical isometry, and
propose a simple alternative that overcomes these by initial parameter sharing.Comment: NeurIPS 201
Coupled Ensembles of Neural Networks
We investigate in this paper the architecture of deep convolutional networks.
Building on existing state of the art models, we propose a reconfiguration of
the model parameters into several parallel branches at the global network
level, with each branch being a standalone CNN. We show that this arrangement
is an efficient way to significantly reduce the number of parameters without
losing performance or to significantly improve the performance with the same
level of performance. The use of branches brings an additional form of
regularization. In addition to the split into parallel branches, we propose a
tighter coupling of these branches by placing the "fuse (averaging) layer"
before the Log-Likelihood and SoftMax layers during training. This gives
another significant performance improvement, the tighter coupling favouring the
learning of better representations, even at the level of the individual
branches. We refer to this branched architecture as "coupled ensembles". The
approach is very generic and can be applied with almost any DCNN architecture.
With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain
error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and
SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%,
and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC
networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and
1.42% respectively on these tasks
TreeGrad: Transferring Tree Ensembles to Neural Networks
Gradient Boosting Decision Tree (GBDT) are popular machine learning
algorithms with implementations such as LightGBM and in popular machine
learning toolkits like Scikit-Learn. Many implementations can only produce
trees in an offline manner and in a greedy manner. We explore ways to convert
existing GBDT implementations to known neural network architectures with
minimal performance loss in order to allow decision splits to be updated in an
online manner and provide extensions to allow splits points to be altered as a
neural architecture search problem. We provide learning bounds for our neural
network.Comment: Technical Report on Implementation of Deep Neural Decision Forests
Algorithm. To accompany implementation here:
https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019).
"Transferring Tree Ensembles to Neural Networks". International Conference on
Neural Information Processing. Springer, 2019. arXiv admin note: text overlap
with arXiv:1909.1179
- …