Search CORE

7 research outputs found

Multi-Modal Mean-Fields via Cardinality-Based Clamping

Author: Baqué Pierre
Fleuret François
Fua Pascal
Publication venue
Publication date: 23/11/2016
Field of study

Mean Field inference is central to statistical physics. It has attracted much interest in the Computer Vision community to efficiently solve problems expressible in terms of large Conditional Random Fields. However, since it models the posterior probability distribution as a product of marginal probabilities, it may fail to properly account for important dependencies between variables. We therefore replace the fully factorized distribution of Mean Field by a weighted mixture of such distributions, that similarly minimizes the KL-Divergence to the true posterior. By introducing two new ideas, namely, conditioning on groups of variables instead of single ones and using a parameter of the conditional random field potentials, that we identify to the temperature in the sense of statistical physics to select such groups, we can perform this minimization efficiently. Our extension of the clamping method proposed in previous works allows us to both produce a more descriptive approximation of the true posterior and, inspired by the diverse MAP paradigms, fit a mixture of Mean Field approximations. We demonstrate that this positively impacts real-world algorithms that initially relied on mean fields.Comment: Submitted for review to CVPR 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Recommended from our members

Converting to Optimization in Machine Learning: Perturb-and-MAP, Differential Privacy, and Program Synthesis

Author: Balog Matej
Publication venue: University of Cambridge
Publication date: 01/02/2020
Field of study

On a mathematical level, most computational problems encountered in machine learning are instances of one of four abstract, fundamental problems: sampling, integration, optimization, and search. Thanks to the rich history of the respective mathematical fields, disparate methods with different properties have been developed for these four problem classes. As a result it can be beneficial to convert a problem from one abstract class into a problem of a different class, because the latter might come with insights, techniques, and algorithms well suited to the particular problem at hand. In particular, this thesis contributes four new methods and generalizations of existing methods for converting specific non-optimization machine learning tasks into optimization problems with more appealing properties. The first example is partition function estimation (an integration problem), where an existing algorithm -- the Gumbel trick -- for converting to the MAP optimization problem is generalized into a more general family of algorithms, such that other instances of this family have better statistical properties. Second, this family of algorithms is further generalized to another integration problem, the problem of estimating Rényi entropies. The third example shows how an intractable sampling problem arising when wishing to publicly release a database containing sensitive data in a safe ("differentially private") manner can be converted into an optimization problem using the theory of Reproducing Kernel Hilbert Spaces. Finally, the fourth case study casts the challenging discrete search problem of program synthesis from input-output examples as a supervised learning task that can be efficiently tackled using gradient-based optimization. In all four instances, the conversions result in novel algorithms with desirable properties. In the first instance, new generalizations of the Gumbel trick can be used to construct statistical estimators of the partition function that achieve the same estimation error while using up to 40% fewer samples. The second instance shows that unbiased estimators of the Rényi entropy can be constructed in the Perturb-and-MAP framework. The main contribution of the third instance is theoretical: the conversion shows that it is possible to construct an algorithm for releasing synthetic databases that approximate databases containing sensitive data in a mathematically precise sense, and to prove results about their approximation errors. Finally, the fourth conversion yields an algorithm for synthesising program source code from input-output examples that is able to solve test problems 1-3 orders of magnitude faster than a wide range of baselines

Apollo (Cambridge)

MPG.PuRe

Methods for Inference in Graphical Models

Author: Weller Adrian
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

Graphical models provide a flexible, powerful and compact way to model relationships between random variables, and have been applied with great success in many domains. Combining prior beliefs with observed evidence to form a prediction is called inference. Problems of great interest include finding a configuration with highest probability (MAP inference) or solving for the distribution over a subset of variables (marginal inference). Further, these methods are often critical subroutines for learning the relationships. However, inference is computationally intractable in general. Hence, much effort has focused on two themes: finding subdomains where exact inference is solvable efficiently, or identifying approximate methods that work well. We explore both these themes, restricting attention to undirected graphical models with discrete variables. First we address exact MAP inference by advancing the recent method of reducing the problem to finding a maximum weight stable set (MWSS) on a derived graph, which, if perfect, admits polynomial time inference. We derive new results for this approach, including a general decomposition theorem for models of any order and number of labels, extensions of results for binary pairwise models with submodular cost functions to higher order, and a characterization of which binary pairwise models can be efficiently solved with this method. This clarifies the power of the approach on this class of models, improves our toolbox and provides insight into the range of tractable models. Next we consider methods of approximate inference, with particular emphasis on the Bethe approximation, which is in widespread use and has proved remarkably effective, yet is still far from being completely understood. We derive new formulations and properties of the derivatives of the Bethe free energy, then use these to establish an algorithm to compute log of the optimum Bethe partition function to arbitrary epsilon-accuracy. Further, if the model is attractive, we demonstrate a fully polynomial-time approximation scheme (FPTAS), which is an important theoretical result, and demonstrate its practical applications. Next we explore ways to tease apart the two aspects of the Bethe approximation, i.e. the polytope relaxation and the entropy approximation. We derive analytic results, show how optimization may be explored over various polytopes in practice, even for large models, and remark on the observed performance compared to the true distribution and the tree-reweighted (TRW) approximation. This reveals important novel observations and helps guide inference in practice. Finally, we present results related to clamping a selection of variables in a model. We derive novel lower bounds on an array of approximate partition functions based only on the model's topology. Further, we show that in an attractive binary pairwise model, clamping any variable and summing over the approximate sub-partition functions can only increase (hence improve) the Bethe approximation, then use this to provide a new, short proof that the Bethe partition function lower bounds the true value for this class of models. The bulk of this work focuses on the class of binary, pairwise models, but several results apply more generally

CiteSeerX

Columbia University Academic Commons

Mean-Field methods for Structured Deep-Learning in Computer Vision

Author: Baqué Pierre Bruno
Publication venue: Lausanne, EPFL
Publication date: 10/07/2018
Field of study

In recent years, Machine Learning based Computer Vision techniques made impressive progress. These algorithms proved particularly efficient for image classification or detection of isolated objects. From a probabilistic perspective, these methods can predict marginals, over single or multiple variables, independently, with high accuracy. However, in many tasks of practical interest, we need to predict jointly several correlated variables. Practical applications include people detection in crowded scenes, image segmentation, surface reconstruction, 3D pose estimation and others. A large part of the research effort in today's computer-vision community aims at finding task-specific solutions to these problems, while leveraging the power of Deep-Learning based classifiers. In this thesis, we present our journey towards a generic and practical solution based on mean-field (MF) inference. Mean-field is a Statistical Physics-inspired method which has long been used in Computer-Vision as a variational approximation to posterior distributions over complex Conditional Random Fields. Standard mean-field optimization is based on coordinate descent and in many situations can be impractical. We therefore propose a novel proximal gradient-based approach to optimizing the variational objective. It is naturally parallelizable and easy to implement. We prove its convergence, and then demonstrate that, in practice, it yields faster convergence and often finds better optima than more traditional mean-field optimization techniques. Then, we show that we can replace the fully factorized distribution of mean-field by a weighted mixture of such distributions, that similarly minimizes the KL-Divergence to the true posterior. Our extension of the clamping method proposed in previous works allows us to both produce a more descriptive approximation of the true posterior and, inspired by the diverse MAP paradigms, fit a mixture of mean-field approximations. We demonstrate that this positively impacts real-world algorithms that initially relied on mean-fields. One of the important properties of the mean-field inference algorithms is that the closed-form updates are fully differentiable operations. This naturally allows to do parameter learning by simply unrolling multiple iterations of the updates, the so-called back-mean-field algorithm. We derive a novel and efficient structured learning method for multi-modal posterior distribution based on the Multi-Modal Mean-Field approximation, which can be seamlessly combined to modern gradient-based learning methods such as CNNs. Finally, we explore in more details the specific problem of structured learning and prediction for multiple-people detection in crowded scenes. We then present a mean-field based structured deep-learning detection algorithm that provides state of the art results on this dataset

Infoscience - École polytechnique fédérale de Lausanne

Approximating the Bethe partition function

Author: Adrian Weller
Tony Jebara
Publication venue
Publication date: 01/01/2013
Field of study

When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy F, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced by Weller and Jebara for attractive binary pairwise MRFs which is guaranteed to return an ɛ-approximation to the global minimum of F in polynomial time provided the maximum degree ∆ = O(log n), where n is the number of variables. Here we extend their approach and derive a new method based on analyzing first derivatives of F, which leads to much better performance and, for attractive models, yields a fully polynomial-time approximation scheme (FPTAS) without any degree restriction. Further, our methods apply to general (nonattractive) models, though with no polynomial time guarantee in this case, demonstrating that approximating log of the Bethe partition function, log ZB = − min F, for a general model to additive ɛ-accuracy may be reduced to a discrete MAP inference problem. This allows the merits of the global Bethe optimum to be tested

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

Approximating the Bethe partition function

Author: Jebara T
Weller A
Publication venue
Publication date
Field of study

When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy

F

, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced for attractive binary pairwise MRFs which is guaranteed to return an

\epsilon

-approximation to the global minimum of

F

in polynomial time provided the maximum degree

\Delta=O(\log n)

, where

n

is the number of variables. Here we significantly improve this algorithm and derive several results including a new approach based on analyzing first derivatives of

F

, which leads to performance that is typically far superior and yields a fully polynomial-time approximation scheme (FPTAS) for attractive models without any degree restriction. Further, the method applies to general (non-attractive) models, though with no polynomial time guarantee in this case, leading to the important result that approximating

\log

of the Bethe partition function,

\log Z_B=-\min F

, for a general model to additive

\epsilon

-accuracy may be reduced to a discrete MAP inference problem. We explore an application to predicting equipment failure on an urban power network and demonstrate that the Bethe approximation can perform well even when BP fails to converge

CUED - Cambridge University Engineering Department