2,335 research outputs found
Generalisation under gradient descent via deterministic PAC-Bayes
We establish disintegrated PAC-Bayesian generalisation bounds for models
trained with gradient descent methods or continuous gradient flows. Contrary to
standard practice in the PAC-Bayesian setting, our result applies to
optimisation algorithms that are deterministic, without requiring any
de-randomisation step. Our bounds are fully computable, depending on the
density of the initial distribution and the Hessian of the training objective
over the trajectory. We show that our framework can be applied to a variety of
iterative optimisation algorithms, including stochastic gradient descent (SGD),
momentum-based schemes, and damped Hamiltonian dynamics
On the impact of selected modern deep-learning techniques to the performance and celerity of classification models in an experimental high-energy physics use case
Beginning from a basic neural-network architecture, we test the potential
benefits offered by a range of advanced techniques for machine learning, in
particular deep learning, in the context of a typical classification problem
encountered in the domain of high-energy physics, using a well-studied dataset:
the 2014 Higgs ML Kaggle dataset. The advantages are evaluated in terms of both
performance metrics and the time required to train and apply the resulting
models. Techniques examined include domain-specific data-augmentation, learning
rate and momentum scheduling, (advanced) ensembling in both model-space and
weight-space, and alternative architectures and connection methods. Following
the investigation, we arrive at a model which achieves equal performance to the
winning solution of the original Kaggle challenge, whilst being significantly
quicker to train and apply, and being suitable for use with both GPU and CPU
hardware setups. These reductions in timing and hardware requirements
potentially allow the use of more powerful algorithms in HEP analyses, where
models must be retrained frequently, sometimes at short notice, by small groups
of researchers with limited hardware resources. Additionally, a new wrapper
library for PyTorch called LUMIN is presented, which incorporates all of the
techniques studied.Comment: Preprint V4: Fixing typographical error and correcting two plots.
Mach. Learn.: Sci. Technol (2020
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Real-world datasets exhibit imbalances of varying types and degrees. Several
techniques based on re-weighting and margin adjustment of loss are often used
to enhance the performance of neural networks, particularly on minority
classes. In this work, we analyze the class-imbalanced learning problem by
examining the loss landscape of neural networks trained with re-weighting and
margin-based techniques. Specifically, we examine the spectral density of
Hessian of class-wise loss, through which we observe that the network weights
converge to a saddle point in the loss landscapes of minority classes.
Following this observation, we also find that optimization methods designed to
escape from saddle points can be effectively used to improve generalization on
minority classes. We further theoretically and empirically demonstrate that
Sharpness-Aware Minimization (SAM), a recent technique that encourages
convergence to a flat minima, can be effectively used to escape saddle points
for minority classes. Using SAM results in a 6.2\% increase in accuracy on the
minority classes over the state-of-the-art Vector Scaling Loss, leading to an
overall average increase of 4\% across imbalanced datasets. The code is
available at: https://github.com/val-iisc/Saddle-LongTail.Comment: NeurIPS 2022. Code: https://github.com/val-iisc/Saddle-LongTai
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Is Evolution an Algorithm? Effects of local entropy in unsupervised learning and protein evolution
L'abstract è presente nell'allegato / the abstract is in the attachmen
- …