147 research outputs found
Learning curves for Soft Margin Classifiers
Typical learning curves for Soft Margin Classifiers (SMCs) learning both
realizable and unrealizable tasks are determined using the tools of Statistical
Mechanics. We derive the analytical behaviour of the learning curves in the
regimes of small and large training sets. The generalization errors present
different decay laws towards the asymptotic values as a function of the
training set size, depending on general geometrical characteristics of the rule
to be learned. Optimal generalization curves are deduced through a fine tuning
of the hyperparameter controlling the trade-off between the error and the
regularization terms in the cost function. Even if the task is realizable, the
optimal performance of the SMC is better than that of a hard margin Support
Vector Machine (SVM) learning the same rule, and is very close to that of the
Bayesian classifier.Comment: 26 pages, 10 figure
A Widely Applicable Bayesian Information Criterion
A statistical model or a learning machine is called regular if the map taking
a parameter to a probability distribution is one-to-one and if its Fisher
information matrix is always positive definite. If otherwise, it is called
singular. In regular statistical models, the Bayes free energy, which is
defined by the minus logarithm of Bayes marginal likelihood, can be
asymptotically approximated by the Schwarz Bayes information criterion (BIC),
whereas in singular models such approximation does not hold.
Recently, it was proved that the Bayes free energy of a singular model is
asymptotically given by a generalized formula using a birational invariant, the
real log canonical threshold (RLCT), instead of half the number of parameters
in BIC. Theoretical values of RLCTs in several statistical models are now being
discovered based on algebraic geometrical methodology. However, it has been
difficult to estimate the Bayes free energy using only training samples,
because an RLCT depends on an unknown true distribution.
In the present paper, we define a widely applicable Bayesian information
criterion (WBIC) by the average log likelihood function over the posterior
distribution with the inverse temperature , where is the number
of training samples. We mathematically prove that WBIC has the same asymptotic
expansion as the Bayes free energy, even if a statistical model is singular for
and unrealizable by a statistical model. Since WBIC can be numerically
calculated without any information about a true distribution, it is a
generalized version of BIC onto singular statistical models.Comment: 30 page
Learning and generalization in radial basis function networks
The aim of supervised learning is to approximate an unknown target function
by adjusting the parameters of a learning model in response to possibly noisy
examples generated by the target function. The performance of the learning model
at this task can be quantified by examining its generalization ability. Initially the
concept of generalization is reviewed, and various methods of measuring it, such as
generalization error, prediction error, PAC learning and the evidence, are discussed
and the relations between them examined. Some of these relations are dependent
on the architecture of the learning model.Two architectures are prevalent in practical supervised learning: the multi -layer
perceptron (MLP) and the radial basis function network (RBF). While the RBF
has previously been examined from a worst -case perspective, this gives little insight
into the performance and phenomena that can be expected in the typical case.
This thesis focusses on the properties of learning and generalization that can be
expected on average in the RBF.There are two methods in use for training the RBF. The basis functions can be
fixed in advance, utilising an unsupervised learning algorithm, or can adapt during
the training process. For the case in which the basis functions are fixed, the
typical generalization error given a data set of particular size is calculated by
employing the Bayesian framework. The effects of noisy data and regularization
are examined, the optimal settings of the parameters that control the learning
process are calculated, and the consequences of a mismatch between the learning
model and the data -generating mechanism are demonstrated.The second case, in which the basis functions are adapted, is studied utilising the
on -line learning paradigm. The average evolution of generalization error is calculated in a manner which allows the phenomena of the learning process, such as the
specialization of the basis functions, to be eludicated. The three most important
stages of training: the symmetric phase, the symmetry- breaking phase and the
convergence phase, are analyzed in detail; the convergence phase analysis allows
the derivation of maximal and optimal learning rates. Noise on both the inputs
and outputs of the data -generating mechanism is introduced, and the consequences
examined. Regularization via weight decay is also studied, as are the effects of the
learning model being poorly matched to the data generator
Analysis of Natural Gradient Descent for Multilayer Neural Networks
Natural gradient descent is a principled method for adapting the parameters
of a statistical model on-line using an underlying Riemannian parameter space
to redefine the direction of steepest descent. The algorithm is examined via
methods of statistical physics which accurately characterize both transient and
asymptotic behavior. A solution of the learning dynamics is obtained for the
case of multilayer neural network training in the limit of large input
dimension. We find that natural gradient learning leads to optimal asymptotic
performance and outperforms gradient descent in the transient, significantly
shortening or even removing plateaus in the transient generalization
performance which typically hamper gradient descent training.Comment: 14 pages including figures. To appear in Physical Review
Heuristics for the refinement of assumptions in generalized reactivity formulae
Reactive synthesis is concerned with automatically generating implementations from formal specifications. These specifications are typically written in the language of generalized reactivity (GR(1)), a subset of linear temporal logic capable of expressing the most common industrial specification patterns, and describe the requirements about the behavior of a system under assumptions about the environment where the system is to be deployed. Oftentimes no implementation exists which guarantees the required behavior under all possible environments, typically due to missing assumptions (this is usually referred to as unrealizability). To address this issue, new assumptions need to be added to complete the specification, a problem known as assumptions refinement. Since the space of candidate assumptions is intractably large, searching for the best solutions is inherently hard. In particular, new methods are needed to (i) increase the effectiveness of the search procedures, measured as the ratio between the number of solutions found and of refinements explored; and (ii) improve the results' quality, defined as the weakness of the solutions. In this thesis we propose a set of heuristics to meet these goals, and a methodology to assess and compare assumptions refinement methods based on quantitative metrics. The heuristics are in the form of algorithms to generate candidate refinements during the search, and quantitative measures to assess the quality of the candidates.
We first discuss a heuristic method to generate assumptions that target the cause of unrealizability. This is done by selecting candidate refinement formulas based on Craig's interpolation. We provide a formal underpinning of the technique and evaluate it in terms of our new metric of effectiveness, as defined above, whose value is improved with respect to the state of the art. We demonstrate this on a set of popular benchmarks of embedded software.
We then provide a formal, quantitative characterization of the permissiveness of environment assumptions in the form of a weakness measure. We prove that the partial order induced by this measure is consistent with the one induced by implication. The key advantage of this measure is that it allows for prioritizing candidate solutions, as we show experimentally.
Lastly, we propose a notion of minimal refinements with respect to the observed counterstrategies. We demonstrate that exploring minimal refinements produces weaker solutions, and reduces the amount of computations needed to explore each refinement. However, this may come at the cost of reducing the effectiveness of the search. To counteract this effect, we propose a hybrid search approach in which both minimal and non-minimal refinements are explored.Open Acces
- …