15 research outputs found
Statistical physics of neural systems
The ability of processing and storing information is considered a characteristic
trait of intelligent systems. In biological neural networks, learning is strongly
believed to take place at the synaptic level, in terms of modulation of synaptic
efficacy. It can be thus interpreted as the expression of a collective phenomena,
emerging when neurons connect each other in constituting a complex network of
interactions. In this work, we represent learning as an optimization problem, actually
implementing a local search, in the synaptic space, of specific configurations, known
as solutions and making a neural network able to accomplish a series of different
tasks. For instance, we would like the network to adapt the strength of its synaptic
connections, in order to be capable of classifying a series of objects, by assigning to
each object its corresponding class-label. Supported by a series of experiments, it
has been suggested that synapses may exploit a very few number of synaptic states
for encoding information. It is known that this feature makes learning in neural
networks a challenging task. Extending the large deviation analysis performed in
the extreme case of binary synaptic couplings, in this work, we prove the existence
of regions of the phase space, where solutions are organized in extremely dense
clusters. This picture turns out to be invariant to the tuning of all the parameters of
the model. Solutions within the clusters are more robust to noise, thus enhancing the
learning performances. This has inspired the design of new learning algorithms, as
well as it has clarified the effectiveness of the previously proposed ones. We further
provide quantitative evidence that the gain achievable when considering a greater
number of available synaptic states for encoding information, is consistent only up
to a very few number of bits. This is in line with the above mentioned experimental
results. Besides the challenging aspect of low precision synaptic connections, it is
also known that the neuronal environment is extremely noisy. Whether stochasticity
can enhance or worsen the learning performances is currently matter of debate. In
this work, we consider a neural network model where the synaptic connections are random variables, sampled according to a parametrized probability distribution.
We prove that, this source of stochasticity naturally drives towards regions of the
phase space at high densities of solutions. These regions are directly accessible by
means of gradient descent strategies, over the parameters of the synaptic couplings
distribution. We further set up a statistical physics analysis, through which we
show that solutions in the dense regions are characterized by robustness and good
generalization performances. Stochastic neural networks are also capable of building
abstract representations of input stimuli and then generating new input samples,
according to the inferred statistics of the input signal. In this regard, we propose a
new learning rule, called Delayed Correlation Matching (DCM), that relying on the
matching between time-delayed activity correlations, makes a neural network able
to store patterns of neuronal activity. When considering hidden neuronal states, the
DCM learning rule is also able to train Restricted Boltzmann Machines as generative
models. In this work, we further require the DCM learning rule to fulfil some
biological constraints, such as locality, sparseness of the neural coding and the Daleâs
principle. While retaining all these biological requirements, the DCM learning
rule has shown to be effective for different network topologies, and in both on-line
learning regimes and presence of correlated patterns. We further show that it is also
able to prevent the creation of spurious attractor states
On the role of synaptic stochasticity in training low-precision neural networks
Stochasticity and limited precision of synaptic weights in neural network
models are key aspects of both biological and hardware modeling of learning
processes. Here we show that a neural network model with stochastic binary
weights naturally gives prominence to exponentially rare dense regions of
solutions with a number of desirable properties such as robustness and good
generalization performance, while typical solutions are isolated and hard to
find. Binary solutions of the standard perceptron problem are obtained from a
simple gradient descent procedure on a set of real values parametrizing a
probability distribution over the binary synapses. Both analytical and
numerical results are presented. An algorithmic extension aimed at training
discrete deep neural networks is also investigated.Comment: 7 pages + 14 pages of supplementary materia
Learning may need only a few bits of synaptic precision
Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic
states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by
hardware implementation considerations as well. In this paper we extend a previous large deviations analysis
which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the
possibility of learning efficiently in networks with binary synapses. We extend the analysis to synapses with
multiple states and generally more plausible biological features. The results clearly indicate that the overall
qualitative picture is unchanged with respect to the binary case, and very robust to variation of the details of
the model. We also provide quantitative results which suggest that the advantages of increasing the synaptic
precision (i.e., the number of internal synaptic states) rapidly vanish after the first few bits, and therefore that,
for practical applications, only few bits may be needed for near-optimal performance, consistent with recent
biological findings. Finally, we demonstrate how the theoretical analysis can be exploited to design efficient
algorithmic search strategies
From inverse problems to learning: A Statistical Mechanics approach
No abstract availabl
Inducing bias is simpler than you think
Machine learning may be oblivious to human bias but it is not immune to its
perpetuation. Marginalisation and iniquitous group representation are often
traceable in the very data used for training, and may be reflected or even
enhanced by the learning models. To counter this, some of the model accuracy
can be traded off for a secondary objective that helps prevent a specific type
of bias. Multiple notions of fairness have been proposed to this end but recent
studies show that some fairness criteria often stand in mutual competition.
In the present work, we introduce a solvable high-dimensional model of data
imbalance, where parametric control over the many bias-inducing factors allows
for an extensive exploration of the bias inheritance mechanism. Through the
tools of statistical physics, we analytically characterise the typical
behaviour of learning models trained in our synthetic framework and find
similar unfairness behaviours as those observed on more realistic data.
However, we also identify a positive transfer effect between the different
subpopulations within the data. This suggests that mixing data with different
statistical properties could be helpful, provided the learning model is made
aware of this structure.
Finally, we analyse the issue of bias mitigation: by reweighing the various
terms in the training loss, we indirectly minimise standard unfairness metrics
and highlight their incompatibilities. Leveraging the insights on positive
transfer, we also propose a theory-informed mitigation strategy, based on the
introduction of coupled learning models. By allowing each model to specialise
on a different community within the data, we find that multiple fairness
criteria and high accuracy can be achieved simultaneously.Comment: 9 pages, 7 figures + appendi
Generalisation error in learning with random features and the hidden manifold model
We study generalised linear regression and classi cation for a synthetically generated dataset encompassing di erent problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymptotic general-isation performance in these problems, valid in both the under-and over-parametrised regimes and for a broad choice of generalised linear model loss functions. In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model. Beyond the interest in these particular problems, the theoretical formalism introduced in this manuscript provides a path to further extensions to more complex tasks
Critical initialisation in continuous approximations of binary neural networks
The training of stochastic neural network models with binary () weights
and activations via continuous surrogate networks is investigated. We derive
new surrogates using a novel derivation based on writing the stochastic neural
network as a Markov chain. This derivation also encompasses existing variants
of the surrogates presented in the literature. Following this, we theoretically
study the surrogates at initialisation. We derive, using mean field theory, a
set of scalar equations describing how input signals propagate through the
randomly initialised networks. The equations reveal whether so-called critical
initialisations exist for each surrogate network, where the network can be
trained to arbitrary depth. Moreover, we predict theoretically and confirm
numerically, that common weight initialisation schemes used in standard
continuous networks, when applied to the mean values of the stochastic binary
weights, yield poor training performance. This study shows that, contrary to
common intuition, the means of the stochastic binary weights should be
initialised close to , for deeper networks to be trainable
Gaussian Universality of Perceptrons with Random Labels
While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance. In the limit of vanishing regularization, we further demonstrate that the training loss is independent of the data covariance. On the theoretical side, we prove this universality for an arbitrary mixture of homogeneous Gaussian clouds. Empirically, we show that the universality holds also for a broad range of real datasets
Probing transfer learning with a model of synthetic correlated datasets
Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting
the relatedness between a data-scarce target task and a data-abundant source task. Despite years of
successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical
understanding of these procedures is still limited. In the present work, we re-think a solvable model
of synthetic data as a framework for modeling correlation between data-sets. This setup allows for
an analytic characterization of the generalization performance obtained when transferring the
learned feature map from the source to the target task. Focusing on the problem of training
two-layer networks in a binary classification setting, we show that our model can capture a range of
salient features of transfer learning with real data. Moreover, by exploiting parametric control over
the correlation between the two data-sets, we systematically investigate under which conditions the
transfer of features is beneficial for generalization