Search CORE

157,243 research outputs found

A Generic Path Algorithm for Regularized Statistical Estimation

Author: Wu Yichao
Zhou Hua
Publication venue
Publication date: 17/01/2012
Field of study

Regularization is widely used in statistics and machine learning to prevent overfitting and gear solution towards prior information. In general, a regularized estimation problem minimizes the sum of a loss function and a penalty term. The penalty term is usually weighted by a tuning parameter and encourages certain constraints on the parameters to be estimated. Particular choices of constraints lead to the popular lasso, fused-lasso, and other generalized

l_1

penalized regression methods. Although there has been a lot of research in this area, developing efficient optimization methods for many nonseparable penalties remains a challenge. In this article we propose an exact path solver based on ordinary differential equations (EPSODE) that works for any convex loss function and can deal with generalized

l_1

penalties as well as more complicated regularization such as inequality constraints encountered in shape-restricted regressions and nonparametric density estimation. In the path following process, the solution path hits, exits, and slides along the various constraints and vividly illustrates the tradeoffs between goodness of fit and model parsimony. In practice, the EPSODE can be coupled with AIC, BIC,

C_p

or cross-validation to select an optimal tuning parameter. Our applications to generalized

l_1

regularized generalized linear models, shape-restricted regressions, Gaussian graphical models, and nonparametric density estimation showcase the potential of the EPSODE algorithm.Comment: 28 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Incremental Learning of Nonparametric Bayesian Mixture Models

Author: Gomes Ryan
Perona Pietro
Welling Max
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Clustering is a fundamental task in many vision applications. To date, most clustering algorithms work in a batch setting and training examples must be gathered in a large group before learning can begin. Here we explore incremental clustering, in which data can arrive continuously. We present a novel incremental model-based clustering algorithm based on nonparametric Bayesian methods, which we call Memory Bounded Variational Dirichlet Process (MB-VDP). The number of clusters are determined flexibly by the data and the approach can be used to automatically discover object categories. The computational requirements required to produce model updates are bounded and do not grow with the amount of data processed. The technique is well suited to very large datasets, and we show that our approach outperforms existing online alternatives for learning nonparametric Bayesian mixture models

CiteSeerX

Caltech Authors

Learning in Markov Random Fields with Contrastive Free Energies

Author: Sutton Charles
Welling Max
Publication venue
Publication date: 01/01/2005
Field of study

Learning Markov random field (MRF) models is notoriously hard due to the presence of a global normalization factor. In this paper we present a new framework for learning MRF models based on the contrastive free energy (CF) objective function. In this scheme the parameters are updated in an attempt to match the average statistics of the data distribution and a distribution which is (partially or approximately) "relaxed" to the equilibrium distribution. We show that maximum likelihood, mean field, contrastive divergence and pseudo-likelihood objectives can be understood in this paradigm. Moreover, we propose and study a new learning algorithm: the "kstep Kikuchi/Bethe approximation". This algorithm is then tested on a conditional random field model with "skip-chain" edges to model long range interactions in text data. It is demonstrated that with no loss in accuracy, the training time is brought down on average from 19 hours (BP based learning) to 83 minutes, an order of magnitude improvement

CiteSeerX

Edinburgh Research Explorer

Bayesian Optimization for Adaptive MCMC

Author: de Freitas Nando
Hamze Firas
Mahendran Nimalan
Wang Ziyu
Publication venue
Publication date: 01/01/2011
Field of study

This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.Comment: This paper contains 12 pages and 6 figures. A similar version of this paper has been submitted to AISTATS 2012 and is currently under revie

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

A Graphical Model Formulation of Collaborative Filtering Neighbourhood Methods with Fast Maximum Entropy Training

Author: Caetano Tiberio
Defazio Aaron
Publication venue
Publication date: 01/01/2012
Field of study

Item neighbourhood methods for collaborative filtering learn a weighted graph over the set of items, where each item is connected to those it is most similar to. The prediction of a user's rating on an item is then given by that rating of neighbouring items, weighted by their similarity. This paper presents a new neighbourhood approach which we call item fields, whereby an undirected graphical model is formed over the item graph. The resulting prediction rule is a simple generalization of the classical approaches, which takes into account non-local information in the graph, allowing its best results to be obtained when using drastically fewer edges than other neighbourhood approaches. A fast approximate maximum entropy training method based on the Bethe approximation is presented, which uses a simple gradient ascent procedure. When using precomputed sufficient statistics on the Movielens datasets, our method is faster than maximum likelihood approaches by two orders of magnitude.Comment: ICML201

arXiv.org e-Print Archive

CiteSeerX