9,838 research outputs found

    A Graphical Model Formulation of Collaborative Filtering Neighbourhood Methods with Fast Maximum Entropy Training

    Full text link
    Item neighbourhood methods for collaborative filtering learn a weighted graph over the set of items, where each item is connected to those it is most similar to. The prediction of a user's rating on an item is then given by that rating of neighbouring items, weighted by their similarity. This paper presents a new neighbourhood approach which we call item fields, whereby an undirected graphical model is formed over the item graph. The resulting prediction rule is a simple generalization of the classical approaches, which takes into account non-local information in the graph, allowing its best results to be obtained when using drastically fewer edges than other neighbourhood approaches. A fast approximate maximum entropy training method based on the Bethe approximation is presented, which uses a simple gradient ascent procedure. When using precomputed sufficient statistics on the Movielens datasets, our method is faster than maximum likelihood approaches by two orders of magnitude.Comment: ICML201

    Learning in Markov Random Fields with Contrastive Free Energies

    Get PDF
    Learning Markov random field (MRF) models is notoriously hard due to the presence of a global normalization factor. In this paper we present a new framework for learning MRF models based on the contrastive free energy (CF) objective function. In this scheme the parameters are updated in an attempt to match the average statistics of the data distribution and a distribution which is (partially or approximately) "relaxed" to the equilibrium distribution. We show that maximum likelihood, mean field, contrastive divergence and pseudo-likelihood objectives can be understood in this paradigm. Moreover, we propose and study a new learning algorithm: the "kstep Kikuchi/Bethe approximation". This algorithm is then tested on a conditional random field model with "skip-chain" edges to model long range interactions in text data. It is demonstrated that with no loss in accuracy, the training time is brought down on average from 19 hours (BP based learning) to 83 minutes, an order of magnitude improvement

    Blending Learning and Inference in Structured Prediction

    Full text link
    In this paper we derive an efficient algorithm to learn the parameters of structured predictors in general graphical models. This algorithm blends the learning and inference tasks, which results in a significant speedup over traditional approaches, such as conditional random fields and structured support vector machines. For this purpose we utilize the structures of the predictors to describe a low dimensional structured prediction task which encourages local consistencies within the different structures while learning the parameters of the model. Convexity of the learning task provides the means to enforce the consistencies between the different parts. The inference-learning blending algorithm that we propose is guaranteed to converge to the optimum of the low dimensional primal and dual programs. Unlike many of the existing approaches, the inference-learning blending allows us to learn efficiently high-order graphical models, over regions of any size, and very large number of parameters. We demonstrate the effectiveness of our approach, while presenting state-of-the-art results in stereo estimation, semantic segmentation, shape reconstruction, and indoor scene understanding

    Minimum Conditional Description Length Estimation for Markov Random Fields

    Full text link
    In this paper we discuss a method, which we call Minimum Conditional Description Length (MCDL), for estimating the parameters of a subset of sites within a Markov random field. We assume that the edges are known for the entire graph G=(V,E)G=(V,E). Then, for a subset UāŠ‚VU\subset V, we estimate the parameters for nodes and edges in UU as well as for edges incident to a node in UU, by finding the exponential parameter for that subset that yields the best compression conditioned on the values on the boundary āˆ‚U\partial U. Our estimate is derived from a temporally stationary sequence of observations on the set UU. We discuss how this method can also be applied to estimate a spatially invariant parameter from a single configuration, and in so doing, derive the Maximum Pseudo-Likelihood (MPL) estimate.Comment: Information Theory and Applications (ITA) workshop, February 201
    • ā€¦
    corecore