Search CORE

74,432 research outputs found

Proximal Mean Field Learning in Shallow Neural Networks

Author: Halder Abhishek
Nodozi Iman
Teter Alexis
Publication venue
Publication date: 15/12/2023
Field of study

We propose a custom learning algorithm for shallow over-parameterized neural networks, i.e., networks with single hidden layer having infinite width. The infinite width of the hidden layer serves as an abstraction for the over-parameterization. Building on the recent mean field interpretations of learning dynamics in shallow neural networks, we realize mean field learning as a computational algorithm, rather than as an analytical tool. Specifically, we design a Sinkhorn regularized proximal algorithm to approximate the distributional flow for the learning dynamics over weighted point clouds. In this setting, a contractive fixed point recursion computes the time-varying weights, numerically realizing the interacting Wasserstein gradient flow of the parameter distribution supported over the neuronal ensemble. An appealing aspect of the proposed algorithm is that the measure-valued recursions allow meshless computation. We demonstrate the proposed computational framework of interacting weighted particle evolution on binary and multi-class classification. Our algorithm performs gradient descent of the free energy associated with the risk functional

arXiv.org e-Print Archive

Model Fusion via Optimal Transport

Author: Jaggi Martin
Singh Sidak Pal
Publication venue
Publication date: 14/02/2021
Field of study

Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We show that this can successfully yield "one-shot" knowledge transfer (i.e, without requiring any retraining) between neural networks trained on heterogeneous non-i.i.d. data. In both i.i.d. and non-i.i.d. settings , we illustrate that our approach significantly outperforms vanilla averaging, as well as how it can serve as an efficient replacement for the ensemble with moderate fine-tuning, for standard convolutional networks (like VGG11), residual networks (like ResNet18), and multi-layer perceptrons on CIFAR10, CIFAR100, and MNIST. Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression. The code is available at the following link, https://github.com/sidak/otfusion.Comment: NeurIPS 2020 conference proceedings (early version featured in the Optimal Transport & Machine Learning workshop, NeurIPS 2019

arXiv.org e-Print Archive

DEFEG: deep ensemble with weighted feature generation.

Author: Han Kate
Liew Alan Wee-Chung
Luong Anh Vu
McCall John
Nguyen Tien Thanh
Vu Trung Hieu
Publication venue: 'Elsevier BV'
Publication date: 02/06/2023
Field of study

With the significant breakthrough of Deep Neural Networks in recent years, multi-layer architecture has influenced other sub-fields of machine learning including ensemble learning. In 2017, Zhou and Feng introduced a deep random forest called gcForest that involves several layers of Random Forest-based classifiers. Although gcForest has outperformed several benchmark algorithms on specific datasets in terms of classification accuracy and model complexity, its input features do not ensure better performance when going deeply through layer-by-layer architecture. We address this limitation by introducing a deep ensemble model with a novel feature generation module. Unlike gcForest where the original features are concatenated to the outputs of classifiers to generate the input features for the subsequent layer, we integrate weights on the classifiers’ outputs as augmented features to grow the deep model. The usage of weights in the feature generation process can adjust the input data of each layer, leading the better results for the deep model. We encode the weights using variable-length encoding and develop a variable-length Particle Swarm Optimisation method to search for the optimal values of the weights by maximizing the classification accuracy on the validation data. Experiments on a number of UCI datasets confirm the benefit of the proposed method compared to some well-known benchmark algorithms

Open Access Institutional Repository at Robert Gordon University

Recommended from our members

Modern Problems in Mathematical Signal Processing: Quantized Compressed Sensing and Randomized Neural Networks

Author: Nelson Aaron Andrew
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

We study two problems from mathematical signal processing. First, we consider problem of approximately recovering signals on a smooth, compact manifold from one-bit linear measurements drawn from either a Gaussian ensemble, partial circulant ensemble, or bounded orthonormal ensemble and quantized using

\Sigma\Delta

or distributed noise-shaping schemes. We construct a convex optimization algorithm for signal recovery that, given a Geometric Multi-Resolution Analysis approximation of the manifold, guarantees signal recovery with high probability. We prove an upper bound on the recovery error which outperforms prior works that use memoryless scalar quantization, requires a simpler analysis, and extends the class of measurements beyond Gaussians.Second, we consider the problem of approximation continuous functions on compact domains using neural networks. The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of B.~Igelnik and Y.H.~Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. We begin to fill this theoretical gap by providing a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with

\varepsilon

-error convergence rate inversely proportional to the number of network nodes; we then extend this result to the non-asymptotic setting using a concentration inequality for Monte-Carlo integral approximations. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic cases

eScholarship - University of California

Neuroevolutionary learning in nonstationary environments

Author: Da Cruz AA
Escovedo T
Koshiyama A
Vellasco M
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 30/01/2020
Field of study

This work presents a new neuro-evolutionary model, called NEVE (Neuroevolutionary Ensemble), based on an ensemble of Multi-Layer Perceptron (MLP) neural networks for learning in nonstationary environments. NEVE makes use of quantum-inspired evolutionary models to automatically configure the ensemble members and combine their output. The quantum-inspired evolutionary models identify the most appropriate topology for each MLP network, select the most relevant input variables, determine the neural network weights and calculate the voting weight of each ensemble member. Four different approaches of NEVE are developed, varying the mechanism for detecting and treating concepts drifts, including proactive drift detection approaches. The proposed models were evaluated in real and artificial datasets, comparing the results obtained with other consolidated models in the literature. The results show that the accuracy of NEVE is higher in most cases and the best configurations are obtained using some mechanism for drift detection. These results reinforce that the neuroevolutionary ensemble approach is a robust choice for situations in which the datasets are subject to sudden changes in behaviour

UCL Discovery