413 research outputs found
Efficiency of quantum versus classical annealing in non-convex learning problems
Quantum annealers aim at solving non-convex optimization problems by
exploiting cooperative tunneling effects to escape local minima. The underlying
idea consists in designing a classical energy function whose ground states are
the sought optimal solutions of the original optimization problem and add a
controllable quantum transverse field to generate tunneling processes. A key
challenge is to identify classes of non-convex optimization problems for which
quantum annealing remains efficient while thermal annealing fails. We show that
this happens for a wide class of problems which are central to machine
learning. Their energy landscapes is dominated by local minima that cause
exponential slow down of classical thermal annealers while simulated quantum
annealing converges efficiently to rare dense regions of optimal solutions.Comment: 31 pages, 10 figure
Reward sharpens orientation coding independently on attention
Rewarding improves performance. Is it due to modulations of the output modules of the neural systems or are there mechanisms favoring more 'generous' inputs? Some recent study included V1 in the the circuitry of reward-based modulations, but the effects of reward can easily be confused with effects of attention. Here we address this issue with a psychophysical dual task to control attention while orientation sensitivity on targets associated to different levels of reward is measured. We found that different reward rates improve orientation discrimination and sharpen the internal response distributions. Data are unaffected by changing attentional load nor by dissociating the feature of the reward cue from the feature relevant for the task. This suggests that reward may act independently on attention by modulating the activity of early sensory stages, perhaps V1, through a SNR improvement of task-relevant channels. Reward acts like attention, but using separate channels
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex
high-dimensional loss function, typically by a stochastic gradient descent
(SGD) strategy. The learning process is observed to be able to find good
minimizers without getting stuck in local critical points, and that such
minimizers are often satisfactory at avoiding overfitting. How these two
features can be kept under control in nonlinear devices composed of millions of
tunable connections is a profound and far reaching open question. In this paper
we study basic non-convex one- and two-layer neural network models which learn
random patterns, and derive a number of basic geometrical and algorithmic
features which suggest some answers. We first show that the error loss function
presents few extremely wide flat minima (WFM) which coexist with narrower
minima and critical points. We then show that the minimizers of the
cross-entropy loss function overlap with the WFM of the error loss. We also
show examples of learning devices for which WFM do not exist. From the
algorithmic perspective we derive entropy driven greedy and message passing
algorithms which focus their search on wide flat regions of minimizers. In the
case of SGD and cross-entropy loss, we show that a slow reduction of the norm
of the weights along the learning process also leads to WFM. We corroborate the
results by a numerical study of the correlations between the volumes of the
minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
Systematically and efficiently improving existing -means initialization algorithms by pairwise-nearest-neighbor smoothing
We present a meta-method for initializing (seeding) the -means clustering
algorithm called PNN-smoothing. It consists in splitting a given dataset into
random subsets, clustering each of them individually, and merging the
resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a
meta-method in the sense that when clustering the individual subsets any
seeding algorithm can be used. If the computational complexity of that seeding
algorithm is linear in the size of the data and the number of clusters ,
PNN-smoothing is also almost linear with an appropriate choice of , and
quite competitive in practice. We show empirically, using several existing
seeding methods and testing on several synthetic and real datasets, that this
procedure results in systematically better costs. Our implementation is
publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl.Comment: 12 pages (+8 appendix), 2 figures, 3 tables (+14 appendix
Systematically and efficiently improving k-means initialization by pairwise-nearest-neighbor smoothing
We present a meta-method for initializing (seeding) the -means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data and the number of clusters , PNN-smoothing is also almost linear with an appropriate choice of , and quite competitive in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. In particular, our method of enhancing -means++ seeding proves superior in both effectiveness and speed compared to the popular ``greedy'' -means++ variant. Our implementation is publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl
A three-threshold learning rule approaches the maximal capacity of recurrent neural networks
Understanding the theoretical foundations of how memories are encoded and
retrieved in neural populations is a central challenge in neuroscience. A
popular theoretical scenario for modeling memory function is the attractor
neural network scenario, whose prototype is the Hopfield model. The model has a
poor storage capacity, compared with the capacity achieved with perceptron
learning algorithms. Here, by transforming the perceptron learning rule, we
present an online learning rule for a recurrent neural network that achieves
near-maximal storage capacity without an explicit supervisory error signal,
relying only upon locally accessible information. The fully-connected network
consists of excitatory binary neurons with plastic recurrent connections and
non-plastic inhibitory feedback stabilizing the network dynamics; the memory
patterns are presented online as strong afferent currents, producing a bimodal
distribution for the neuron synaptic inputs. Synapses corresponding to active
inputs are modified as a function of the value of the local fields with respect
to three thresholds. Above the highest threshold, and below the lowest
threshold, no plasticity occurs. In between these two thresholds,
potentiation/depression occurs when the local field is above/below an
intermediate threshold. We simulated and analyzed a network of binary neurons
implementing this rule and measured its storage capacity for different sizes of
the basins of attraction. The storage capacity obtained through numerical
simulations is shown to be close to the value predicted by analytical
calculations. We also measured the dependence of capacity on the strength of
external inputs. Finally, we quantified the statistics of the resulting
synaptic connectivity matrix, and found that both the fraction of zero weight
synapses and the degree of symmetry of the weight matrix increase with the
number of stored patterns.Comment: 24 pages, 10 figures, to be published in PLOS Computational Biolog
Input-driven unsupervised learning in recurrent neural networks
Understanding the theoretical foundations of how memories are encoded and retrieved in neural populations is a central challenge in neuroscience. A popular theoretical scenario for modeling memory function is an attractor neural network with Hebbian learning (e.g. the Hopfield model). The model simplicity and the locality of the synaptic update rules come at the cost of a limited storage capacity, compared with the capacity achieved with supervised learning algorithms, whose biological plausibility is questionable. Here, we present an on-line learning rule for a recurrent neural network that achieves near-optimal performance without an explicit supervisory error signal and using only locally accessible information, and which is therefore biologically plausible. The fully connected network consists of excitatory units with plastic recurrent connections and non-plastic inhibitory feedback stabilizing the network dynamics; the patterns to be memorized are presented on-line as strong afferent currents, producing a bimodal distribution for the neuron synaptic inputs ('local fields'). Synapses corresponding to active inputs are modified as a function of the position of the local field with respect to three thresholds. Above the highest threshold, and below the lowest threshold, no plasticity occurs. In between these two thresholds, potentiation/depression occurs when the local field is above/below an intermediate threshold. An additional parameter of the model allows to trade storage capacity for robustness, i.e. increased size of the basins of attraction. We simulated a network of 1001 excitatory neurons implementing this rule and measured its storage capacity for different sizes of the basins of attraction: our results show that, for any given basin size, our network more than doubles the storage capacity, compared with a standard Hopfield network. Our learning rule is consistent with available experimental data documenting how plasticity depends on firing rate. It predicts that at high enough firing rates, no potentiation should occu
DNA, Discrimination and the Definition of Family Class: M.A.O. v. Canada (Minister of Citizenship and Immigration)
DNA, Discrimination and the Definition of Family Class: M.A.O. v. Canada (Minister of Citizenship and Immigration)
Simultaneous identification of specifically interacting paralogs and inter-protein contacts by Direct-Coupling Analysis
Understanding protein-protein interactions is central to our understanding of
almost all complex biological processes. Computational tools exploiting rapidly
growing genomic databases to characterize protein-protein interactions are
urgently needed. Such methods should connect multiple scales from evolutionary
conserved interactions between families of homologous proteins, over the
identification of specifically interacting proteins in the case of multiple
paralogs inside a species, down to the prediction of residues being in physical
contact across interaction interfaces. Statistical inference methods detecting
residue-residue coevolution have recently triggered considerable progress in
using sequence data for quaternary protein structure prediction; they require,
however, large joint alignments of homologous protein pairs known to interact.
The generation of such alignments is a complex computational task on its own;
application of coevolutionary modeling has in turn been restricted to proteins
without paralogs, or to bacterial systems with the corresponding coding genes
being co-localized in operons. Here we show that the Direct-Coupling Analysis
of residue coevolution can be extended to connect the different scales, and
simultaneously to match interacting paralogs, to identify inter-protein
residue-residue contacts and to discriminate interacting from noninteracting
families in a multiprotein system. Our results extend the potential
applications of coevolutionary analysis far beyond cases treatable so far.Comment: Main Text 19 pages Supp. Inf. 16 page
- …