147 research outputs found

    Learning curves for Soft Margin Classifiers

    Full text link
    Typical learning curves for Soft Margin Classifiers (SMCs) learning both realizable and unrealizable tasks are determined using the tools of Statistical Mechanics. We derive the analytical behaviour of the learning curves in the regimes of small and large training sets. The generalization errors present different decay laws towards the asymptotic values as a function of the training set size, depending on general geometrical characteristics of the rule to be learned. Optimal generalization curves are deduced through a fine tuning of the hyperparameter controlling the trade-off between the error and the regularization terms in the cost function. Even if the task is realizable, the optimal performance of the SMC is better than that of a hard margin Support Vector Machine (SVM) learning the same rule, and is very close to that of the Bayesian classifier.Comment: 26 pages, 10 figure

    A Widely Applicable Bayesian Information Criterion

    Full text link
    A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/logn1/\log n, where nn is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for and unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models.Comment: 30 page

    Learning and generalization in radial basis function networks

    Get PDF
    The aim of supervised learning is to approximate an unknown target function by adjusting the parameters of a learning model in response to possibly noisy examples generated by the target function. The performance of the learning model at this task can be quantified by examining its generalization ability. Initially the concept of generalization is reviewed, and various methods of measuring it, such as generalization error, prediction error, PAC learning and the evidence, are discussed and the relations between them examined. Some of these relations are dependent on the architecture of the learning model.Two architectures are prevalent in practical supervised learning: the multi -layer perceptron (MLP) and the radial basis function network (RBF). While the RBF has previously been examined from a worst -case perspective, this gives little insight into the performance and phenomena that can be expected in the typical case. This thesis focusses on the properties of learning and generalization that can be expected on average in the RBF.There are two methods in use for training the RBF. The basis functions can be fixed in advance, utilising an unsupervised learning algorithm, or can adapt during the training process. For the case in which the basis functions are fixed, the typical generalization error given a data set of particular size is calculated by employing the Bayesian framework. The effects of noisy data and regularization are examined, the optimal settings of the parameters that control the learning process are calculated, and the consequences of a mismatch between the learning model and the data -generating mechanism are demonstrated.The second case, in which the basis functions are adapted, is studied utilising the on -line learning paradigm. The average evolution of generalization error is calculated in a manner which allows the phenomena of the learning process, such as the specialization of the basis functions, to be eludicated. The three most important stages of training: the symmetric phase, the symmetry- breaking phase and the convergence phase, are analyzed in detail; the convergence phase analysis allows the derivation of maximal and optimal learning rates. Noise on both the inputs and outputs of the data -generating mechanism is introduced, and the consequences examined. Regularization via weight decay is also studied, as are the effects of the learning model being poorly matched to the data generator

    Analysis of Natural Gradient Descent for Multilayer Neural Networks

    Get PDF
    Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods of statistical physics which accurately characterize both transient and asymptotic behavior. A solution of the learning dynamics is obtained for the case of multilayer neural network training in the limit of large input dimension. We find that natural gradient learning leads to optimal asymptotic performance and outperforms gradient descent in the transient, significantly shortening or even removing plateaus in the transient generalization performance which typically hamper gradient descent training.Comment: 14 pages including figures. To appear in Physical Review

    Statistical mechanics of Bayesian model selection

    Get PDF

    Heuristics for the refinement of assumptions in generalized reactivity formulae

    Get PDF
    Reactive synthesis is concerned with automatically generating implementations from formal specifications. These specifications are typically written in the language of generalized reactivity (GR(1)), a subset of linear temporal logic capable of expressing the most common industrial specification patterns, and describe the requirements about the behavior of a system under assumptions about the environment where the system is to be deployed. Oftentimes no implementation exists which guarantees the required behavior under all possible environments, typically due to missing assumptions (this is usually referred to as unrealizability). To address this issue, new assumptions need to be added to complete the specification, a problem known as assumptions refinement. Since the space of candidate assumptions is intractably large, searching for the best solutions is inherently hard. In particular, new methods are needed to (i) increase the effectiveness of the search procedures, measured as the ratio between the number of solutions found and of refinements explored; and (ii) improve the results' quality, defined as the weakness of the solutions. In this thesis we propose a set of heuristics to meet these goals, and a methodology to assess and compare assumptions refinement methods based on quantitative metrics. The heuristics are in the form of algorithms to generate candidate refinements during the search, and quantitative measures to assess the quality of the candidates. We first discuss a heuristic method to generate assumptions that target the cause of unrealizability. This is done by selecting candidate refinement formulas based on Craig's interpolation. We provide a formal underpinning of the technique and evaluate it in terms of our new metric of effectiveness, as defined above, whose value is improved with respect to the state of the art. We demonstrate this on a set of popular benchmarks of embedded software. We then provide a formal, quantitative characterization of the permissiveness of environment assumptions in the form of a weakness measure. We prove that the partial order induced by this measure is consistent with the one induced by implication. The key advantage of this measure is that it allows for prioritizing candidate solutions, as we show experimentally. Lastly, we propose a notion of minimal refinements with respect to the observed counterstrategies. We demonstrate that exploring minimal refinements produces weaker solutions, and reduces the amount of computations needed to explore each refinement. However, this may come at the cost of reducing the effectiveness of the search. To counteract this effect, we propose a hybrid search approach in which both minimal and non-minimal refinements are explored.Open Acces
    corecore