Search CORE

584 research outputs found

Models of Noise and Robust Estimates

Author: Girosi Federico
Publication venue
Publication date: 01/01/1991
Field of study

Given n noisy observations g; of the same quantity f, it is common use to give an estimate of f by minimizing the function Eni=1(gi-f)2. From a statistical point of view this corresponds to computing the Maximum likelihood estimate, under the assumption of Gaussian noise. However, it is well known that this choice leads to results that are very sensitive to the presence of outliers in the data. For this reason it has been proposed to minimize the functions of the form Eni=1V(gi-f), where V is a function that increases less rapidly than the square. Several choices for V have been proposed and successfully used to obtain "robust" estimates. In this paper we show that, for a class of functions V, using these robust estimators corresponds to assuming that data are corrupted by Gaussian noise whose variance fluctuates according to some given probability distribution, that uniquely determines the shape of V

CiteSeerX

DSpace@MIT

An Equivalence Between Sparse Approximation and Support Vector Machines

Author: Girosi Federico
Publication venue
Publication date: 01/01/1997
Field of study

In the first part of this paper we show a similarity between the principle of Structural Risk Minimization Principle (SRM) (Vapnik, 1982) and the idea of Sparse Approximation, as defined in (Chen, Donoho and Saunders, 1995) and Olshausen and Field (1996). Then we focus on two specific (approximate) implementations of SRM and Sparse Approximation, which have been used to solve the problem of function approximation. For SRM we consider the Support Vector Machine technique proposed by V. Vapnik and his team at AT&T Bell Labs, and for Sparse Approximation we consider a modification of the Basis Pursuit De-Noising algorithm proposed by Chen, Donoho and Saunders (1995). We show that, under certain conditions, these two techniques are equivalent: they give the same solution and they require the solution of the same quadratic programming problem

CiteSeerX

DSpace@MIT

A Theory of Networks for Appxoimation and Learning

Author: Girosi Federico
Poggio Tomaso
Publication venue
Publication date: 01/07/1989
Field of study

Learning an input-output mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multi-dimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nolinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. We develop a theoretical framework for approximation based on regularization techniques that leads to a class of three-layer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the well-known Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods such as Parzen windows and potential functions and to several neural network algorithms, such as Kanerva's associative memory, backpropagation and Kohonen's topology preserving map. They also have an interesting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data

DSpace@MIT

Forecasting Global Temperature Variations by Neural Networks

Author: Girosi Federico
Miyano Takaya
Publication venue
Publication date: 01/08/1994
Field of study

Global temperature variations between 1861 and 1984 are forecast usingsregularization networks, multilayer perceptrons and linearsautoregression. The regularization network, optimized by stochasticsgradient descent associated with colored noise, gives the bestsforecasts. For all the models, prediction errors noticeably increasesafter 1965. These results are consistent with the hypothesis that thesclimate dynamics is characterized by low-dimensional chaos and thatsthe it may have changed at some point after 1965, which is alsosconsistent with the recent idea of climate change.

DSpace@MIT

Networks and the Best Approximation Property

Author: Girosi Federico
Poggio Tomaso
Publication venue
Publication date: 01/01/1989
Field of study

Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989; Funahashi, 1989; Stinchcombe and White, 1989). We prove that networks derived from regularization theory and including Radial Basis Function (Poggio and Girosi, 1989), have a similar property. From the point of view of approximation theory, however, the property of approximating continous functions arbitrarily well is not sufficient for characterizing good approximation schemes. More critical is the property of best approximation. The main result of this paper is that multilayer networks, of the type used in backpropagation, are not best approximation. For regularization networks (in particular Radial Basis Function networks) we prove existence and uniqueness of best approximation

CiteSeerX

DSpace@MIT

Notes on PCA, Regularization, Sparsity and Support Vector Machines

Author: Girosi Federico
Poggio Tomaso
Publication venue
Publication date: 01/01/1998
Field of study

We derive a new representation for a function as a linear combination of local correlation kernels at optimal sparse locations and discuss its relation to PCA, regularization, sparsity principles and Support Vector Machines. We first review previous results for the approximation of a function from discrete data (Girosi, 1998) in the context of Vapnik"s feature space and dual representation (Vapnik, 1995). We apply them to show 1) that a standard regularization functional with a stabilizer defined in terms of the correlation function induces a regression function in the span of the feature space of classical Principal Components and 2) that there exist a dual representations of the regression function in terms of a regularization network with a kernel equal to a generalized correlation function. We then describe the main observation of the paper: the dual representation in terms of the correlation function can be sparsified using the Support Vector Machines (Vapnik, 1982) technique and this operation is equivalent to sparsify a large dictionary of basis functions adapted to the task, using a variation of Basis Pursuit De-Noising (Chen, Donoho and Saunders, 1995; see also related work by Donahue and Geiger, 1994; Olshausen and Field, 1995; Lewicki and Sejnowski, 1998). In addition to extending the close relations between regularization, Support Vector Machines and sparsity, our work also illuminates and formalizes the LFA concept of Penev and Atick (1996). We discuss the relation between our results, which are about regression, and the different problem of pattern classification

CiteSeerX

DSpace@MIT

Convergence Rates of Approximation by Translates

Author: Anzellotti Gabriele
Girosi Federico
Publication venue
Publication date: 01/03/1992
Field of study

In this paper we consider the problem of approximating a function belonging to some funtion space Î¦ by a linear comination of n translates of a given function G. Ussing a lemma by Jones (1990) and Barron (1991) we show that it is possible to define function spaces and functions G for which the rate of convergence to zero of the erro is 0(1/n) in any number of dimensions. The apparent avoidance of the "curse of dimensionality" is due to the fact that these function spaces are more and more constrained as the dimension increases. Examples include spaces of the Sobolev tpe, in which the number of weak derivatives is required to be larger than the number of dimensions. We give results both for approximation in the L2 norm and in the Lc norm. The interesting feature of these results is that, thanks to the constructive nature of Jones" and Barron"s lemma, an iterative procedure is defined that can achieve this rate

DSpace@MIT

On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions

Author: Girosi Federico
Niyogi Partha
Publication venue
Publication date: 01/01/1994
Field of study

In this paper, we bound the generalization error of a class of Radial Basis Function networks, for certain well defined function learning tasks, in terms of the number of parameters and number of examples. We show that the total generalization error is partly due to the insufficient representational capacity of the network (because of its finite size) and partly due to insufficient information about the target function (because of finite number of samples). We make several observations about generalization error which are valid irrespective of the approximation scheme. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem

CiteSeerX

DSpace@MIT

Continuous Stochastic Cellular Automata that Have a Stationary Distribution and No Detailed Balance

Author: Girosi Federico
Poggio Tomaso
Publication venue
Publication date: 01/12/1990
Field of study

Marroquin and Ramirez (1990) have recently discovered a class of discrete stochastic cellular automata with Gibbsian invariant measures that have a non-reversible dynamic behavior. Practical applications include more powerful algorithms than the Metropolis algorithm to compute MRF models. In this paper we describe a large class of stochastic dynamical systems that has a Gibbs asymptotic distribution but does not satisfy reversibility. We characterize sufficient properties of a sub-class of stochastic differential equations in terms of the associated Fokker-Planck equation for the existence of an asymptotic probability distribution in the system of coordinates which is given. Practical implications include VLSI analog circuits to compute coupled MRF models

DSpace@MIT

Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines

Author: Girosi Federico
Jones Michael
Poggio Tomaso
Publication venue
Publication date: 01/01/1993
Field of study

We had previously shown that regularization principles lead to approximation schemes, as Radial Basis Functions, which are equivalent to networks with one layer of hidden units, called Regularization Networks. In this paper we show that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models, Breiman's hinge functions and some forms of Projection Pursuit Regression. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to different types of smoothness assumptions. In the final part of the paper, we also show a relation between activation functions of the Gaussian and sigmoidal type

CiteSeerX

DSpace@MIT