9,665 research outputs found
Block Belief Propagation for Parameter Learning in Markov Random Fields
Traditional learning methods for training Markov random fields require doing
inference over all variables to compute the likelihood gradient. The iteration
complexity for those methods therefore scales with the size of the graphical
models. In this paper, we propose \emph{block belief propagation learning}
(BBPL), which uses block-coordinate updates of approximate marginals to compute
approximate gradients, removing the need to compute inference on the entire
graphical model. Thus, the iteration complexity of BBPL does not scale with the
size of the graphs. We prove that the method converges to the same solution as
that obtained by using full inference per iteration, despite these
approximations, and we empirically demonstrate its scalability improvements
over standard training methods.Comment: Accepted to AAAI 201
Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets
Inference is typically intractable in high-treewidth undirected graphical
models, making maximum likelihood learning a challenge. One way to overcome
this is to restrict parameters to a tractable set, most typically the set of
tree-structured parameters. This paper explores an alternative notion of a
tractable set, namely a set of "fast-mixing parameters" where Markov chain
Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the
stationary distribution. While it is common in practice to approximate the
likelihood gradient using samples obtained from MCMC, such procedures lack
theoretical guarantees. This paper proves that for any exponential family with
bounded sufficient statistics, (not just graphical models) when parameters are
constrained to a fast-mixing set, gradient descent with gradients approximated
by sampling will approximate the maximum likelihood solution inside the set
with high-probability. When unregularized, to find a solution epsilon-accurate
in log-likelihood requires a total amount of effort cubic in 1/epsilon,
disregarding logarithmic factors. When ridge-regularized, strong convexity
allows a solution epsilon-accurate in parameter distance with effort quadratic
in 1/epsilon. Both of these provide of a fully-polynomial time randomized
approximation scheme.Comment: Advances in Neural Information Processing Systems 201
Barrier Frank-Wolfe for Marginal Inference
We introduce a globally-convergent algorithm for optimizing the
tree-reweighted (TRW) variational objective over the marginal polytope. The
algorithm is based on the conditional gradient method (Frank-Wolfe) and moves
pseudomarginals within the marginal polytope through repeated maximum a
posteriori (MAP) calls. This modular structure enables us to leverage black-box
MAP solvers (both exact and approximate) for variational inference, and obtains
more accurate results than tree-reweighted algorithms that optimize over the
local consistency relaxation. Theoretically, we bound the sub-optimality for
the proposed algorithm despite the TRW objective having unbounded gradients at
the boundary of the marginal polytope. Empirically, we demonstrate the
increased quality of results found by tightening the relaxation over the
marginal polytope as well as the spanning tree polytope on synthetic and
real-world instances.Comment: 25 pages, 12 figures, To appear in Neural Information Processing
Systems (NIPS) 2015, Corrected reference and cleaned up bibliograph
Hardness of parameter estimation in graphical models
We consider the problem of learning the canonical parameters specifying an
undirected graphical model (Markov random field) from the mean parameters. For
graphical models representing a minimal exponential family, the canonical
parameters are uniquely determined by the mean parameters, so the problem is
feasible in principle. The goal of this paper is to investigate the
computational feasibility of this statistical task. Our main result shows that
parameter estimation is in general intractable: no algorithm can learn the
canonical parameters of a generic pair-wise binary graphical model from the
mean parameters in time bounded by a polynomial in the number of variables
(unless RP = NP). Indeed, such a result has been believed to be true (see the
monograph by Wainwright and Jordan (2008)) but no proof was known.
Our proof gives a polynomial time reduction from approximating the partition
function of the hard-core model, known to be hard, to learning approximate
parameters. Our reduction entails showing that the marginal polytope boundary
has an inherent repulsive property, which validates an optimization procedure
over the polytope that does not use any knowledge of its structure (as required
by the ellipsoid method and others).Comment: 15 pages. To appear in NIPS 201
- …