74,432 research outputs found
Proximal Mean Field Learning in Shallow Neural Networks
We propose a custom learning algorithm for shallow over-parameterized neural
networks, i.e., networks with single hidden layer having infinite width. The
infinite width of the hidden layer serves as an abstraction for the
over-parameterization. Building on the recent mean field interpretations of
learning dynamics in shallow neural networks, we realize mean field learning as
a computational algorithm, rather than as an analytical tool. Specifically, we
design a Sinkhorn regularized proximal algorithm to approximate the
distributional flow for the learning dynamics over weighted point clouds. In
this setting, a contractive fixed point recursion computes the time-varying
weights, numerically realizing the interacting Wasserstein gradient flow of the
parameter distribution supported over the neuronal ensemble. An appealing
aspect of the proposed algorithm is that the measure-valued recursions allow
meshless computation. We demonstrate the proposed computational framework of
interacting weighted particle evolution on binary and multi-class
classification. Our algorithm performs gradient descent of the free energy
associated with the risk functional
Model Fusion via Optimal Transport
Combining different models is a widely used paradigm in machine learning
applications. While the most common approach is to form an ensemble of models
and average their individual predictions, this approach is often rendered
infeasible by given resource constraints in terms of memory and computation,
which grow linearly with the number of models. We present a layer-wise model
fusion algorithm for neural networks that utilizes optimal transport to (soft-)
align neurons across the models before averaging their associated parameters.
We show that this can successfully yield "one-shot" knowledge transfer (i.e,
without requiring any retraining) between neural networks trained on
heterogeneous non-i.i.d. data. In both i.i.d. and non-i.i.d. settings , we
illustrate that our approach significantly outperforms vanilla averaging, as
well as how it can serve as an efficient replacement for the ensemble with
moderate fine-tuning, for standard convolutional networks (like VGG11),
residual networks (like ResNet18), and multi-layer perceptrons on CIFAR10,
CIFAR100, and MNIST. Finally, our approach also provides a principled way to
combine the parameters of neural networks with different widths, and we explore
its application for model compression. The code is available at the following
link, https://github.com/sidak/otfusion.Comment: NeurIPS 2020 conference proceedings (early version featured in the
Optimal Transport & Machine Learning workshop, NeurIPS 2019
DEFEG: deep ensemble with weighted feature generation.
With the significant breakthrough of Deep Neural Networks in recent years, multi-layer architecture has influenced other sub-fields of machine learning including ensemble learning. In 2017, Zhou and Feng introduced a deep random forest called gcForest that involves several layers of Random Forest-based classifiers. Although gcForest has outperformed several benchmark algorithms on specific datasets in terms of classification accuracy and model complexity, its input features do not ensure better performance when going deeply through layer-by-layer architecture. We address this limitation by introducing a deep ensemble model with a novel feature generation module. Unlike gcForest where the original features are concatenated to the outputs of classifiers to generate the input features for the subsequent layer, we integrate weights on the classifiers’ outputs as augmented features to grow the deep model. The usage of weights in the feature generation process can adjust the input data of each layer, leading the better results for the deep model. We encode the weights using variable-length encoding and develop a variable-length Particle Swarm Optimisation method to search for the optimal values of the weights by maximizing the classification accuracy on the validation data. Experiments on a number of UCI datasets confirm the benefit of the proposed method compared to some well-known benchmark algorithms
Recommended from our members
Modern Problems in Mathematical Signal Processing: Quantized Compressed Sensing and Randomized Neural Networks
We study two problems from mathematical signal processing. First, we consider problem of approximately recovering signals on a smooth, compact manifold from one-bit linear measurements drawn from either a Gaussian ensemble, partial circulant ensemble, or bounded orthonormal ensemble and quantized using or distributed noise-shaping schemes. We construct a convex optimization algorithm for signal recovery that, given a Geometric Multi-Resolution Analysis approximation of the manifold, guarantees signal recovery with high probability. We prove an upper bound on the recovery error which outperforms prior works that use memoryless scalar quantization, requires a simpler analysis, and extends the class of measurements beyond Gaussians.Second, we consider the problem of approximation continuous functions on compact domains using neural networks. The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of B.~Igelnik and Y.H.~Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. We begin to fill this theoretical gap by providing a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with -error convergence rate inversely proportional to the number of network nodes; we then extend this result to the non-asymptotic setting using a concentration inequality for Monte-Carlo integral approximations. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic cases
Neuroevolutionary learning in nonstationary environments
This work presents a new neuro-evolutionary model, called NEVE (Neuroevolutionary Ensemble), based on an ensemble of Multi-Layer Perceptron (MLP) neural networks for learning in nonstationary environments. NEVE makes use of quantum-inspired evolutionary models to automatically configure the ensemble members and combine their output. The quantum-inspired evolutionary models identify the most appropriate topology for each MLP network, select the most relevant input variables, determine the neural network weights and calculate the voting weight of each ensemble member. Four different approaches of NEVE are developed, varying the mechanism for detecting and treating concepts drifts, including proactive drift detection approaches. The proposed models were evaluated in real and artificial datasets, comparing the results obtained with other consolidated models in the literature. The results show that the accuracy of NEVE is higher in most cases and the best configurations are obtained using some mechanism for drift detection. These results reinforce that the neuroevolutionary ensemble approach is a robust choice for situations in which the datasets are subject to sudden changes in behaviour
- …