15 research outputs found

    Statistical physics of neural systems

    Get PDF
    The ability of processing and storing information is considered a characteristic trait of intelligent systems. In biological neural networks, learning is strongly believed to take place at the synaptic level, in terms of modulation of synaptic efficacy. It can be thus interpreted as the expression of a collective phenomena, emerging when neurons connect each other in constituting a complex network of interactions. In this work, we represent learning as an optimization problem, actually implementing a local search, in the synaptic space, of specific configurations, known as solutions and making a neural network able to accomplish a series of different tasks. For instance, we would like the network to adapt the strength of its synaptic connections, in order to be capable of classifying a series of objects, by assigning to each object its corresponding class-label. Supported by a series of experiments, it has been suggested that synapses may exploit a very few number of synaptic states for encoding information. It is known that this feature makes learning in neural networks a challenging task. Extending the large deviation analysis performed in the extreme case of binary synaptic couplings, in this work, we prove the existence of regions of the phase space, where solutions are organized in extremely dense clusters. This picture turns out to be invariant to the tuning of all the parameters of the model. Solutions within the clusters are more robust to noise, thus enhancing the learning performances. This has inspired the design of new learning algorithms, as well as it has clarified the effectiveness of the previously proposed ones. We further provide quantitative evidence that the gain achievable when considering a greater number of available synaptic states for encoding information, is consistent only up to a very few number of bits. This is in line with the above mentioned experimental results. Besides the challenging aspect of low precision synaptic connections, it is also known that the neuronal environment is extremely noisy. Whether stochasticity can enhance or worsen the learning performances is currently matter of debate. In this work, we consider a neural network model where the synaptic connections are random variables, sampled according to a parametrized probability distribution. We prove that, this source of stochasticity naturally drives towards regions of the phase space at high densities of solutions. These regions are directly accessible by means of gradient descent strategies, over the parameters of the synaptic couplings distribution. We further set up a statistical physics analysis, through which we show that solutions in the dense regions are characterized by robustness and good generalization performances. Stochastic neural networks are also capable of building abstract representations of input stimuli and then generating new input samples, according to the inferred statistics of the input signal. In this regard, we propose a new learning rule, called Delayed Correlation Matching (DCM), that relying on the matching between time-delayed activity correlations, makes a neural network able to store patterns of neuronal activity. When considering hidden neuronal states, the DCM learning rule is also able to train Restricted Boltzmann Machines as generative models. In this work, we further require the DCM learning rule to fulfil some biological constraints, such as locality, sparseness of the neural coding and the Dale’s principle. While retaining all these biological requirements, the DCM learning rule has shown to be effective for different network topologies, and in both on-line learning regimes and presence of correlated patterns. We further show that it is also able to prevent the creation of spurious attractor states

    On the role of synaptic stochasticity in training low-precision neural networks

    Get PDF
    Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension aimed at training discrete deep neural networks is also investigated.Comment: 7 pages + 14 pages of supplementary materia

    Learning may need only a few bits of synaptic precision

    Get PDF
    Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the possibility of learning efficiently in networks with binary synapses. We extend the analysis to synapses with multiple states and generally more plausible biological features. The results clearly indicate that the overall qualitative picture is unchanged with respect to the binary case, and very robust to variation of the details of the model. We also provide quantitative results which suggest that the advantages of increasing the synaptic precision (i.e., the number of internal synaptic states) rapidly vanish after the first few bits, and therefore that, for practical applications, only few bits may be needed for near-optimal performance, consistent with recent biological findings. Finally, we demonstrate how the theoretical analysis can be exploited to design efficient algorithmic search strategies

    Inducing bias is simpler than you think

    Full text link
    Machine learning may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. To counter this, some of the model accuracy can be traded off for a secondary objective that helps prevent a specific type of bias. Multiple notions of fairness have been proposed to this end but recent studies show that some fairness criteria often stand in mutual competition. In the present work, we introduce a solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical behaviour of learning models trained in our synthetic framework and find similar unfairness behaviours as those observed on more realistic data. However, we also identify a positive transfer effect between the different subpopulations within the data. This suggests that mixing data with different statistical properties could be helpful, provided the learning model is made aware of this structure. Finally, we analyse the issue of bias mitigation: by reweighing the various terms in the training loss, we indirectly minimise standard unfairness metrics and highlight their incompatibilities. Leveraging the insights on positive transfer, we also propose a theory-informed mitigation strategy, based on the introduction of coupled learning models. By allowing each model to specialise on a different community within the data, we find that multiple fairness criteria and high accuracy can be achieved simultaneously.Comment: 9 pages, 7 figures + appendi

    ç›źæŹĄ

    Get PDF

    Generalisation error in learning with random features and the hidden manifold model

    No full text
    We study generalised linear regression and classi cation for a synthetically generated dataset encompassing di erent problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymptotic general-isation performance in these problems, valid in both the under-and over-parametrised regimes and for a broad choice of generalised linear model loss functions. In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model. Beyond the interest in these particular problems, the theoretical formalism introduced in this manuscript provides a path to further extensions to more complex tasks

    Critical initialisation in continuous approximations of binary neural networks

    No full text
    The training of stochastic neural network models with binary (±1\pm1) weights and activations via continuous surrogate networks is investigated. We derive new surrogates using a novel derivation based on writing the stochastic neural network as a Markov chain. This derivation also encompasses existing variants of the surrogates presented in the literature. Following this, we theoretically study the surrogates at initialisation. We derive, using mean field theory, a set of scalar equations describing how input signals propagate through the randomly initialised networks. The equations reveal whether so-called critical initialisations exist for each surrogate network, where the network can be trained to arbitrary depth. Moreover, we predict theoretically and confirm numerically, that common weight initialisation schemes used in standard continuous networks, when applied to the mean values of the stochastic binary weights, yield poor training performance. This study shows that, contrary to common intuition, the means of the stochastic binary weights should be initialised close to ±1\pm 1, for deeper networks to be trainable

    Gaussian Universality of Perceptrons with Random Labels

    No full text
    While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance. In the limit of vanishing regularization, we further demonstrate that the training loss is independent of the data covariance. On the theoretical side, we prove this universality for an arbitrary mixture of homogeneous Gaussian clouds. Empirically, we show that the universality holds also for a broad range of real datasets

    Probing transfer learning with a model of synthetic correlated datasets

    Get PDF
    Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets. This setup allows for an analytic characterization of the generalization performance obtained when transferring the learned feature map from the source to the target task. Focusing on the problem of training two-layer networks in a binary classification setting, we show that our model can capture a range of salient features of transfer learning with real data. Moreover, by exploiting parametric control over the correlation between the two data-sets, we systematically investigate under which conditions the transfer of features is beneficial for generalization
    corecore