391 research outputs found
Generalized Approximate Survey Propagation for High-Dimensional Estimation
In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal
that is observed through a linear transform followed by a component-wise,
possibly nonlinear and noisy, channel. In the Bayesian optimal setting,
Generalized Approximate Message Passing (GAMP) is known to achieve optimal
performance for GLE. However, its performance can significantly degrade
whenever there is a mismatch between the assumed and the true generative model,
a situation frequently encountered in practice. In this paper, we propose a new
algorithm, named Generalized Approximate Survey Propagation (GASP), for solving
GLE in the presence of prior or model mis-specifications. As a prototypical
example, we consider the phase retrieval problem, where we show that GASP
outperforms the corresponding GAMP, reducing the reconstruction threshold and,
for certain choices of its parameters, approaching Bayesian optimal
performance. Furthermore, we present a set of State Evolution equations that
exactly characterize the dynamics of GASP in the high-dimensional limit
Out of equilibrium Statistical Physics of learning
In the study of hard optimization problems, it is often unfeasible to achieve
a full analytic control on the dynamics of the algorithmic processes that
find solutions efficiently. In many cases, a static approach is able to provide
considerable insight into the dynamical properties of these algorithms: in fact,
the geometrical structures found in the energetic landscape can strongly affect
the stationary states and the optimal configurations reached by the solvers.
In this context, a classical Statistical Mechanics approach, relying on the
assumption of the asymptotic realization of a Boltzmann Gibbs equilibrium,
can yield misleading predictions when the studied algorithms comprise some
stochastic components that effectively drive these processes out of equilibrium.
Thus, it becomes necessary to develop some intuition on the relevant features
of the studied phenomena and to build an ad hoc Large Deviation analysis,
providing a more targeted and richer description of the geometrical properties
of the landscape. The present thesis focuses on the study of learning processes
in Artificial Neural Networks, with the aim of introducing an out of equilibrium
statistical physics framework, based on the introduction of a local entropy
potential, for supporting and inspiring algorithmic improvements in the field
of Deep Learning, and for developing models of neural computation that can
carry both biological and engineering interest
Solvable Model for Inheriting the Regularization through Knowledge Distillation
In recent years the empirical success of transfer learning with neural
networks has stimulated an increasing interest in obtaining a theoretical
understanding of its core properties. Knowledge distillation where a smaller
neural network is trained using the outputs of a larger neural network is a
particularly interesting case of transfer learning. In the present work, we
introduce a statistical physics framework that allows an analytic
characterization of the properties of knowledge distillation (KD) in shallow
neural networks. Focusing the analysis on a solvable model that exhibits a
non-trivial generalization gap, we investigate the effectiveness of KD. We are
able to show that, through KD, the regularization properties of the larger
teacher model can be inherited by the smaller student and that the yielded
generalization performance is closely linked to and limited by the optimality
of the teacher. Finally, we analyze the double descent phenomenology that can
arise in the considered KD setting
- …