68 research outputs found

    Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets

    Full text link
    Inference is typically intractable in high-treewidth undirected graphical models, making maximum likelihood learning a challenge. One way to overcome this is to restrict parameters to a tractable set, most typically the set of tree-structured parameters. This paper explores an alternative notion of a tractable set, namely a set of "fast-mixing parameters" where Markov chain Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the stationary distribution. While it is common in practice to approximate the likelihood gradient using samples obtained from MCMC, such procedures lack theoretical guarantees. This paper proves that for any exponential family with bounded sufficient statistics, (not just graphical models) when parameters are constrained to a fast-mixing set, gradient descent with gradients approximated by sampling will approximate the maximum likelihood solution inside the set with high-probability. When unregularized, to find a solution epsilon-accurate in log-likelihood requires a total amount of effort cubic in 1/epsilon, disregarding logarithmic factors. When ridge-regularized, strong convexity allows a solution epsilon-accurate in parameter distance with effort quadratic in 1/epsilon. Both of these provide of a fully-polynomial time randomized approximation scheme.Comment: Advances in Neural Information Processing Systems 201

    Hardness of parameter estimation in graphical models

    Full text link
    We consider the problem of learning the canonical parameters specifying an undirected graphical model (Markov random field) from the mean parameters. For graphical models representing a minimal exponential family, the canonical parameters are uniquely determined by the mean parameters, so the problem is feasible in principle. The goal of this paper is to investigate the computational feasibility of this statistical task. Our main result shows that parameter estimation is in general intractable: no algorithm can learn the canonical parameters of a generic pair-wise binary graphical model from the mean parameters in time bounded by a polynomial in the number of variables (unless RP = NP). Indeed, such a result has been believed to be true (see the monograph by Wainwright and Jordan (2008)) but no proof was known. Our proof gives a polynomial time reduction from approximating the partition function of the hard-core model, known to be hard, to learning approximate parameters. Our reduction entails showing that the marginal polytope boundary has an inherent repulsive property, which validates an optimization procedure over the polytope that does not use any knowledge of its structure (as required by the ellipsoid method and others).Comment: 15 pages. To appear in NIPS 201

    Factor Graphs for Computer Vision and Image Processing

    No full text
    Factor graphs have been used extensively in the decoding of error correcting codes such as turbo codes, and in signal processing. However, while computer vision and pattern recognition are awash with graphical model usage, it is some-what surprising that factor graphs are still somewhat under-researched in these communities. This is surprising because factor graphs naturally generalise both Markov random fields and Bayesian networks. Moreover, they are useful in modelling relationships between variables that are not necessarily probabilistic and allow for efficient marginalisation via a sum-product of probabilities. In this thesis, we present and illustrate the utility of factor graphs in the vision community through some of the field’s popular problems. The thesis does so with a particular focus on maximum a posteriori (MAP) inference in graphical structures with layers. To this end, we are able to break-down complex problems into factored representations and more computationally realisable constructions. Firstly, we present a sum-product framework that uses the explicit factorisation in local subgraphs from the partitioned factor graph of a layered structure to perform inference. This provides an efficient method to perform inference since exact inference is attainable in the resulting local subtrees. Secondly, we extend this framework to the entire graphical structure without partitioning, and discuss preliminary ways to combine outputs from a multilevel construction. Lastly, we further our endeavour to combine evidence from different methods through a simplicial spanning tree reparameterisation of the factor graph in a way that ensures consistency, to produce an ensembled and improved result. Throughout the thesis, the underlying feature we make use of is to enforce adjacency constraints using Delaunay triangulations computed by adding points dynamically, or using a convex hull algorithm. The adjacency relationships from Delaunay triangulations aid the factor graph approaches in this thesis to be both efficient and competitive for computer vision tasks. This is because of the low treewidth they provide in local subgraphs, as well as the reparameterised interpretation of the graph they form through the spanning tree of simplexes. While exact inference is known to be intractable for junction trees obtained from the loopy graphs in computer vision, in this thesis we are able to effect exact inference on our spanning tree of simplexes. More importantly, the approaches presented here are not restricted to the computer vision and image processing fields, but are extendable to more general applications that involve distributed computations

    Projecting Markov Random Field Parameters for Fast Mixing

    Full text link
    Markov chain Monte Carlo (MCMC) algorithms are simple and extremely powerful techniques to sample from almost arbitrary distributions. The flaw in practice is that it can take a large and/or unknown amount of time to converge to the stationary distribution. This paper gives sufficient conditions to guarantee that univariate Gibbs sampling on Markov Random Fields (MRFs) will be fast mixing, in a precise sense. Further, an algorithm is given to project onto this set of fast-mixing parameters in the Euclidean norm. Following recent work, we give an example use of this to project in various divergence measures, comparing univariate marginals obtained by sampling after projection to common variational methods and Gibbs sampling on the original parameters.Comment: Neural Information Processing Systems 201

    Exact rotamer optimization for computational protein design

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (leaves 235-244).The search for the global minimum energy conformation (GMEC) of protein side chains is an important computational challenge in protein structure prediction and design. Using rotamer models, the problem is formulated as a NP-hard optimization problem. Dead-end elimination (DEE) methods combined with systematic A* search (DEE/A*) have proven useful, but may not be strong enough as we attempt to solve protein design problems where a large number of similar rotamers is eligible and the network of interactions between residues is dense. In this thesis, we present an exact solution method, named BroMAP (branch-and-bound rotamer optimization using MAP estimation), for such protein design problems. The design goal of BroMAP is to be able to expand smaller search trees than conventional branch-and-bound methods while performing only a moderate amount of computation in each node, thereby reducing the total running time. To achieve that, BroMAP attempts reduction of the problem size within each node through DEE and elimination by energy lower bounds from approximate maximurn-a-posteriori (MAP) estimation. The lower bounds are also exploited in branching and subproblem selection for fast discovery of strong upper bounds. Our computational results show that BroMAP tends to be faster than DEE/A* for large protein design cases. BroMAP also solved cases that were not solvable by DEE/A* within the maximum allowed time, and did not incur significant disadvantage for cases where DEE/A* performed well. In the second part of the thesis, we explore several ways of improving the energy lower bounds by using Lagrangian relaxation. Through computational experiments, solving the dual problem derived from cyclic subgraphs, such as triplets, is shown to produce stronger lower bounds than using the tree-reweighted max-product algorithm.(cont.) In the second approach, the Lagrangian relaxation is tightened through addition of violated valid inequalities. Finally, we suggest a way of computing individual lower bounds using the dual method. The preliminary results from evaluating BroMAP employing the dual bounds suggest that the use of the strengthened bounds does not in general improve the running time of BroMAP due to the longer running time of the dual method.by Eun-Jong Hong.Ph.D

    Understanding the Scalability of Bayesian Network Inference Using Clique Tree Growth Curves

    Get PDF
    One of the main approaches to performing computation in Bayesian networks (BNs) is clique tree clustering and propagation. The clique tree approach consists of propagation in a clique tree compiled from a Bayesian network, and while it was introduced in the 1980s, there is still a lack of understanding of how clique tree computation time depends on variations in BN size and structure. In this article, we improve this understanding by developing an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN s non-root nodes to the number of root nodes, and (ii) the expected number of moral edges in their moral graphs. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for the total size of each set. For the special case of bipartite BNs, there are two sets and two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, where random bipartite BNs generated using the BPART algorithm are studied, we systematically increase the out-degree of the root nodes in bipartite Bayesian networks, by increasing the number of leaf nodes. Surprisingly, root clique growth is well-approximated by Gompertz growth curves, an S-shaped family of curves that has previously been used to describe growth processes in biology, medicine, and neuroscience. We believe that this research improves the understanding of the scaling behavior of clique tree clustering for a certain class of Bayesian networks; presents an aid for trade-off studies of clique tree clustering using growth curves; and ultimately provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms

    Kombinatorisia algoritmeja graafisten mallien oppimiseen

    Get PDF
    Graphical models are a framework for representing joint distributions over random variables. By capturing the structure of conditional independencies between the variables, a graphical model can express the distribution in a concise factored form that is often efficient to store and reason about. As constructing graphical models by hand is often infeasible, a lot of work has been devoted to learning them automatically from observational data. Of particular interest is the so-called structure learning problem, of finding a graph that encodes the structure of probabilistic dependencies. Once the learner has decided what constitutes a good fit to the data, the task of finding optimal structures typically involves solving an NP-hard problem of combinatorial optimization. While first algorithms for structure learning thus resorted to local search, there has been a growing interest in solving the problem to a global optimum. Indeed, during the past decade multiple exact algorithms have been proposed that are guaranteed to find optimal structures for the family of Bayesian networks, while first steps have been taken for the family of decomposable graphical models. This thesis presents combinatorial algorithms and analytical results with applications in the structure learning problem. For decomposable models, we present exact algorithms for the so-called full Bayesian approach, which involves not only finding individual structures of good fit but also computing posterior expectations of graph features, either by exact computation or via Monte Carlo methods. For Bayesian networks, we study the empirical hardness of the structure learning problem, with the aim of being able to predict the running time of various structure learning algorithms on a given problem instance. As a result, we obtain a hybrid algorithm that effectively combines the best-case performance of multiple existing techniques. Lastly, we study two combinatorial problems of wider interest with relevance in structure learning. First, we present algorithms for counting linear extensions of partially ordered sets, which is required to correct bias in MCMC methods for sampling Bayesian network structures. Second, we give results in the extremal combinatorics of connected vertex sets, whose number bounds the running time of certain algorithms for structure learning and various other problems.Graafiset mallit ovat todennäköisyysmalleja, jotka esittävät muuttujien välisiä tilastollisia suhteita verkkona. Verkon jokainen solmu vastaa yhtä muuttujaa, ja solmujen väliset kaaret kuvaavat muuttujien välisiä riippuvuuksia. Graafinen esitystapa auttaa havainnollistamaan muuttujien kuvaamaa ilmiötä sekä usein mahdollistaa niiden yhteisjakauman esittämisen tiiviissä ja tehokkaasti käsiteltävässä muodossa. Graafisten mallien rakentaminen käsin on kuitenkin usein kohtuuttoman työlästä, mistä syystä niitä on pyritty oppimaan koneellisesti sovittamalla saatavilla olevaan havaintoaineistoon. Erityisesti verkon rakenteen oppiminen on haastava kombinatorinen ongelma, jota on ratkottu pitkälti likimääräisin menetelmin mutta erityisesti viime aikoina myös eksaktisti. Väitöskirjassa esitellään kombinatorisia algoritmeja rakenneoppimisongelman ratkaisemiseksi sekä kokeellisia ja analyyttisiä tuloksia ongelman vaativuudesta. Niin kutsutuille hajoaville graafisille malleille esitellään eksakti algoritmi, joka mahdollistaa sekä yhden optimaalisen verkon löytämisen että niin kutsutun bayesiläisen lähestymistavan, jossa opitaan jakauma kaikkien mahdollisten verkkojen yli. Jakaumasta voidaan joko poimia verkkoja satunnaisotannalla tai se voidaan tiivistää esimerkiksi verkon jokaisen kaaren marginaalitodennäköisyytenä. Bayes-verkot ovat toinen graafisten mallien perhe, joiden oppimiseen on viime aikoina esitetty useita eksakteja algoritmeja. Yksikään tällä hetkellä tunnettu algoritmi ei ole yksiselitteisesti muita nopeampi, vaan eri algoritmit toimivat nopeasti erityyppisillä syötteillä ja niiden ajoajan tarkka arviointi on osoittautunut vaikeaksi. Työn toisessa vaiheessa tutkitaan kokeellisesti, kuinka tällaisten algoritmien ajoaika riippuu annetusta syötteestä, sekä pyritään ennustamaan ajoaikaa koneoppimismenetelmin nopeimman algoritmin valitsemiseksi. Työn loppuosassa tutkitaan kahta kombinatorista ongelmaa, jotka ovat paitsi yleisesti kiinnostavia myös merkittäviä erityisesti Bayes-verkkojen oppimisessa. Ensimmäinen ongelma käsittelee osittaisjärjestysten lineaaristen laajennusten lukumäärän laskemista, jonka avulla korjataan vääristymiä satunnaisotantaan perustuvassa rakenneoppimisessa. Toinen kysymys koskee niin kutsuttujen yhtenäisten solmujoukkojen suurinta mahdollista lukumäärää, joka antaa ylärajan eräiden rakenneoppimisalgoritmien aikavaativuudelle. Suurimmalle lukumäärälle esitetään ylä- ja alarajoja verkoissa, joissa kunkin solmun naapurien lukumäärä on rajoitettu

    Convex relaxation methods for graphical models : Lagrangian and maximum entropy approaches

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 241-257).Graphical models provide compact representations of complex probability distributions of many random variables through a collection of potential functions defined on small subsets of these variables. This representation is defined with respect to a graph in which nodes represent random variables and edges represent the interactions among those random variables. Graphical models provide a powerful and flexible approach to many problems in science and engineering, but also present serious challenges owing to the intractability of optimal inference and estimation over general graphs. In this thesis, we consider convex optimization methods to address two central problems that commonly arise for graphical models. First, we consider the problem of determining the most probable configuration-also known as the maximum a posteriori (MAP) estimate-of all variables in a graphical model, conditioned on (possibly noisy) measurements of some variables. This general problem is intractable, so we consider a Lagrangian relaxation (LR) approach to obtain a tractable dual problem. This involves using the Lagrangian decomposition technique to break up an intractable graph into tractable subgraphs, such as small "blocks" of nodes, embedded trees or thin subgraphs. We develop a distributed, iterative algorithm that minimizes the Lagrangian dual function by block coordinate descent. This results in an iterative marginal-matching procedure that enforces consistency among the subgraphs using an adaptation of the well-known iterative scaling algorithm. This approach is developed both for discrete variable and Gaussian graphical models. In discrete models, we also introduce a deterministic annealing procedure, which introduces a temperature parameter to define a smoothed dual function and then gradually reduces the temperature to recover the (non-differentiable) Lagrangian dual. When strong duality holds, we recover the optimal MAP estimate. We show that this occurs for a broad class of "convex decomposable" Gaussian graphical models, which generalizes the "pairwise normalizable" condition known to be important for iterative estimation in Gaussian models.(cont.) In certain "frustrated" discrete models a duality gap can occur using simple versions of our approach. We consider methods that adaptively enhance the dual formulation, by including more complex subgraphs, so as to reduce the duality gap. In many cases we are able to eliminate the duality gap and obtain the optimal MAP estimate in a tractable manner. We also propose a heuristic method to obtain approximate solutions in cases where there is a duality gap. Second, we consider the problem of learning a graphical model (both the graph and its potential functions) from sample data. We propose the maximum entropy relaxation (MER) method, which is the convex optimization problem of selecting the least informative (maximum entropy) model over an exponential family of graphical models subject to constraints that small subsets of variables should have marginal distributions that are close to the distribution of sample data. We use relative entropy to measure the divergence between marginal probability distributions. We find that MER leads naturally to selection of sparse graphical models. To identify this sparse graph efficiently, we use a "bootstrap" method that constructs the MER solution by solving a sequence of tractable subproblems defined over thin graphs, including new edges at each step to correct for large marginal divergences that violate the MER constraint. The MER problem on each of these subgraphs is efficiently solved using the primaldual interior point method (implemented so as to take advantage of efficient inference methods for thin graphical models). We also consider a dual formulation of MER that minimizes a convex function of the potentials of the graphical model. This MER dual problem can be interpreted as a robust version of maximum-likelihood parameter estimation, where the MER constraints specify the uncertainty in the sufficient statistics of the model. This also corresponds to a regularized maximum-likelihood approach, in which an information-geometric regularization term favors selection of sparse potential representations. We develop a relaxed version of the iterative scaling method to solve this MER dual problem.by Jason K. Johnson.Ph.D
    corecore