153 research outputs found
Statistical Mechanics of Soft Margin Classifiers
We study the typical learning properties of the recently introduced Soft
Margin Classifiers (SMCs), learning realizable and unrealizable tasks, with the
tools of Statistical Mechanics. We derive analytically the behaviour of the
learning curves in the regime of very large training sets. We obtain
exponential and power laws for the decay of the generalization error towards
the asymptotic value, depending on the task and on general characteristics of
the distribution of stabilities of the patterns to be learned. The optimal
learning curves of the SMCs, which give the minimal generalization error, are
obtained by tuning the coefficient controlling the trade-off between the error
and the regularization terms in the cost function. If the task is realizable by
the SMC, the optimal performance is better than that of a hard margin Support
Vector Machine and is very close to that of a Bayesian classifier.Comment: 26 pages, 12 figures, submitted to Physical Review
Statistical Mechanics Approach to Inverse Problems on Networks
Statistical Mechanics has gained a central role in modern Inference and Computer Science. Many optimization and inference problems can be cast in a Statistical Mechanics framework, and various concepts and methods developed in this area of Physics can be very helpful not only in the theoretical analysis, but also constitute valuable tools for solving single instance cases of hard inference and computational tasks. In this work, I address various inverse problems on networks, from models of epidemic spreading to learning in neural networks, and apply a variety of methods which have been developed in the context of Disordered Systems, namely Replica and Cavity methods from the theoretical side, and their algorithmic incarnation, Belief Propagation, to solve hard inverse problems which can be formulated in a Bayesian framework
Out of equilibrium Statistical Physics of learning
In the study of hard optimization problems, it is often unfeasible to achieve
a full analytic control on the dynamics of the algorithmic processes that
find solutions efficiently. In many cases, a static approach is able to provide
considerable insight into the dynamical properties of these algorithms: in fact,
the geometrical structures found in the energetic landscape can strongly affect
the stationary states and the optimal configurations reached by the solvers.
In this context, a classical Statistical Mechanics approach, relying on the
assumption of the asymptotic realization of a Boltzmann Gibbs equilibrium,
can yield misleading predictions when the studied algorithms comprise some
stochastic components that effectively drive these processes out of equilibrium.
Thus, it becomes necessary to develop some intuition on the relevant features
of the studied phenomena and to build an ad hoc Large Deviation analysis,
providing a more targeted and richer description of the geometrical properties
of the landscape. The present thesis focuses on the study of learning processes
in Artificial Neural Networks, with the aim of introducing an out of equilibrium
statistical physics framework, based on the introduction of a local entropy
potential, for supporting and inspiring algorithmic improvements in the field
of Deep Learning, and for developing models of neural computation that can
carry both biological and engineering interest
Combined optimization algorithms applied to pattern classification
Accurate classification by minimizing the error on test samples is the main
goal in pattern classification. Combinatorial optimization is a well-known
method for solving minimization problems, however, only a few examples of
classifiers axe described in the literature where combinatorial optimization is
used in pattern classification. Recently, there has been a growing interest
in combining classifiers and improving the consensus of results for a greater
accuracy. In the light of the "No Ree Lunch Theorems", we analyse the combination
of simulated annealing, a powerful combinatorial optimization method
that produces high quality results, with the classical perceptron algorithm.
This combination is called LSA machine. Our analysis aims at finding paradigms
for problem-dependent parameter settings that ensure high classifica,
tion results. Our computational experiments on a large number of benchmark
problems lead to results that either outperform or axe at least competitive to
results published in the literature. Apart from paxameter settings, our analysis
focuses on a difficult problem in computation theory, namely the network
complexity problem. The depth vs size problem of neural networks is one of
the hardest problems in theoretical computing, with very little progress over
the past decades. In order to investigate this problem, we introduce a new
recursive learning method for training hidden layers in constant depth circuits.
Our findings make contributions to a) the field of Machine Learning, as the
proposed method is applicable in training feedforward neural networks, and to
b) the field of circuit complexity by proposing an upper bound for the number
of hidden units sufficient to achieve a high classification rate. One of the major
findings of our research is that the size of the network can be bounded by
the input size of the problem and an approximate upper bound of 8 + √2n/n
threshold gates as being sufficient for a small error rate, where n := log/SL
and SL is the training set
- …