11,520 research outputs found
Hardness of parameter estimation in graphical models
We consider the problem of learning the canonical parameters specifying an
undirected graphical model (Markov random field) from the mean parameters. For
graphical models representing a minimal exponential family, the canonical
parameters are uniquely determined by the mean parameters, so the problem is
feasible in principle. The goal of this paper is to investigate the
computational feasibility of this statistical task. Our main result shows that
parameter estimation is in general intractable: no algorithm can learn the
canonical parameters of a generic pair-wise binary graphical model from the
mean parameters in time bounded by a polynomial in the number of variables
(unless RP = NP). Indeed, such a result has been believed to be true (see the
monograph by Wainwright and Jordan (2008)) but no proof was known.
Our proof gives a polynomial time reduction from approximating the partition
function of the hard-core model, known to be hard, to learning approximate
parameters. Our reduction entails showing that the marginal polytope boundary
has an inherent repulsive property, which validates an optimization procedure
over the polytope that does not use any knowledge of its structure (as required
by the ellipsoid method and others).Comment: 15 pages. To appear in NIPS 201
Complexity of Discrete Energy Minimization Problems
Discrete energy minimization is widely-used in computer vision and machine
learning for problems such as MAP inference in graphical models. The problem,
in general, is notoriously intractable, and finding the global optimal solution
is known to be NP-hard. However, is it possible to approximate this problem
with a reasonable ratio bound on the solution quality in polynomial time? We
show in this paper that the answer is no. Specifically, we show that general
energy minimization, even in the 2-label pairwise case, and planar energy
minimization with three or more labels are exp-APX-complete. This finding rules
out the existence of any approximation algorithm with a sub-exponential
approximation ratio in the input size for these two problems, including
constant factor approximations. Moreover, we collect and review the
computational complexity of several subclass problems and arrange them on a
complexity scale consisting of three major complexity classes -- PO, APX, and
exp-APX, corresponding to problems that are solvable, approximable, and
inapproximable in polynomial time. Problems in the first two complexity classes
can serve as alternative tractable formulations to the inapproximable ones.
This paper can help vision researchers to select an appropriate model for an
application or guide them in designing new algorithms.Comment: ECCV'16 accepte
Complexity of Inference in Graphical Models
Graphical models provide a convenient representation for a broad class of probability distributions.
Due to their powerful and sophisticated modeling capabilities, such models have
found numerous applications in machine learning and other areas. In this paper we consider the
complexity of commonly encountered tasks involving graphical models such as the computation
of the mode of a posterior probability distribution (i.e., MAP estimation), and the computation
of marginal probabilities or the partition function. It is well-known that such inference problems
are hard in the worst case, but are tractable for models with bounded treewidth. We ask
whether treewidth is the only structural criterion of the underlying graph that enables tractable
inference. In other words, is there some class of structures with unbounded treewidth in which
inference is tractable? Subject to a combinatorial hypothesis due to Robertson, Seymour, and
Thomas (1994), we show that low treewidth is indeed the only structural restriction that can
ensure tractability. More precisely we show that for every growing family of graphs indexed
by tree-width, there exists a choice of potential functions such that the corresponding inference
problem is intractable. Thus even for the "best case" graph structures of high treewidth, there is
no polynomial-time inference algorithm. Our analysis employs various concepts from complexity theory and graph theory, with graph minors playing a prominent role
Forest Density Estimation
We study graph estimation and density estimation in high dimensions, using a
family of density estimators based on forest structured undirected graphical
models. For density estimation, we do not assume the true distribution
corresponds to a forest; rather, we form kernel density estimates of the
bivariate and univariate marginals, and apply Kruskal's algorithm to estimate
the optimal forest on held out data. We prove an oracle inequality on the
excess risk of the resulting estimator relative to the risk of the best forest.
For graph estimation, we consider the problem of estimating forests with
restricted tree sizes. We prove that finding a maximum weight spanning forest
with restricted tree size is NP-hard, and develop an approximation algorithm
for this problem. Viewing the tree size as a complexity parameter, we then
select a forest using data splitting, and prove bounds on excess risk and
structure selection consistency of the procedure. Experiments with simulated
data and microarray data indicate that the methods are a practical alternative
to Gaussian graphical models.Comment: Extended version of earlier paper titled "Tree density estimation
Learning Graphical Models Using Multiplicative Weights
We give a simple, multiplicative-weight update algorithm for learning
undirected graphical models or Markov random fields (MRFs). The approach is
new, and for the well-studied case of Ising models or Boltzmann machines, we
obtain an algorithm that uses a nearly optimal number of samples and has
quadratic running time (up to logarithmic factors), subsuming and improving on
all prior work. Additionally, we give the first efficient algorithm for
learning Ising models over general alphabets.
Our main application is an algorithm for learning the structure of t-wise
MRFs with nearly-optimal sample complexity (up to polynomial losses in
necessary terms that depend on the weights) and running time that is
. In addition, given samples, we can also learn the
parameters of the model and generate a hypothesis that is close in statistical
distance to the true MRF. All prior work runs in time for
graphs of bounded degree d and does not generate a hypothesis close in
statistical distance even for t=3. We observe that our runtime has the correct
dependence on n and t assuming the hardness of learning sparse parities with
noise.
Our algorithm--the Sparsitron-- is easy to implement (has only one parameter)
and holds in the on-line setting. Its analysis applies a regret bound from
Freund and Schapire's classic Hedge algorithm. It also gives the first solution
to the problem of learning sparse Generalized Linear Models (GLMs)
- …