101 research outputs found
Learning loopy graphical models with latent variables: Efficient methods and guarantees
The problem of structure estimation in graphical models with latent variables
is considered. We characterize conditions for tractable graph estimation and
develop efficient methods with provable guarantees. We consider models where
the underlying Markov graph is locally tree-like, and the model is in the
regime of correlation decay. For the special case of the Ising model, the
number of samples required for structural consistency of our method scales
as , where p is the
number of variables, is the minimum edge potential, is
the depth (i.e., distance from a hidden node to the nearest observed nodes),
and is a parameter which depends on the bounds on node and edge
potentials in the Ising model. Necessary conditions for structural consistency
under any algorithm are derived and our method nearly matches the lower bound
on sample requirements. Further, the proposed method is practical to implement
and provides flexibility to control the number of latent variables and the
cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Parallel Algorithms for Geometric Graph Problems
We give algorithms for geometric graph problems in the modern parallel models
inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem
over a set of points in the two-dimensional space, our algorithm computes a
-approximate MST. Our algorithms work in a constant number of
rounds of communication, while using total space and communication proportional
to the size of the data (linear space and near linear time algorithms). In
contrast, for general graphs, achieving the same result for MST (or even
connectivity) remains a challenging open problem, despite drawing significant
attention in recent years.
We develop a general algorithmic framework that, besides MST, also applies to
Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic
framework has implications beyond the MapReduce model. For example it yields a
new algorithm for computing EMD cost in the plane in near-linear time,
. We note that while recently Sharathkumar and Agarwal
developed a near-linear time algorithm for -approximating EMD,
our algorithm is fundamentally different, and, for example, also solves the
transportation (cost) problem, raised as an open question in their work.
Furthermore, our algorithm immediately gives a -approximation
algorithm with space in the streaming-with-sorting model with
passes. As such, it is tempting to conjecture that the
parallel models may also constitute a concrete playground in the quest for
efficient algorithms for EMD (and other similar problems) in the vanilla
streaming model, a well-known open problem
Network Design with Coverage Costs
We study network design with a cost structure motivated by redundancy in data
traffic. We are given a graph, g groups of terminals, and a universe of data
packets. Each group of terminals desires a subset of the packets from its
respective source. The cost of routing traffic on any edge in the network is
proportional to the total size of the distinct packets that the edge carries.
Our goal is to find a minimum cost routing. We focus on two settings. In the
first, the collection of packet sets desired by source-sink pairs is laminar.
For this setting, we present a primal-dual based 2-approximation, improving
upon a logarithmic approximation due to Barman and Chawla (2012). In the second
setting, packet sets can have non-trivial intersection. We focus on the case
where each packet is desired by either a single terminal group or by all of the
groups, and the graph is unweighted. For this setting we present an O(log
g)-approximation.
Our approximation for the second setting is based on a novel spanner-type
construction in unweighted graphs that, given a collection of g vertex subsets,
finds a subgraph of cost only a constant factor more than the minimum spanning
tree of the graph, such that every subset in the collection has a Steiner tree
in the subgraph of cost at most O(log g) that of its minimum Steiner tree in
the original graph. We call such a subgraph a group spanner.Comment: Updated version with additional result
Algorithmic embeddings
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 233-242).We present several computationally efficient algorithms, and complexity results on low distortion mappings between metric spaces. An embedding between two metric spaces is a mapping between the two metric spaces and the distortion of the embedding is the factor by which the distances change. We have pioneered theoretical work on relative (or approximation) version of this problem. In this setting, the question is the following: for the class of metrics C, and a host metric M', what is the smallest approximation factor a > 1 of an efficient algorithm minimizing the distortion of an embedding of a given input metric M E C into M'? This formulation enables the algorithm to adapt to a given input metric. In particular, if the host metric is "expressive enough" to accurately model the input distances, the minimum achievable distortion is low, and the algorithm will produce an embedding with low distortion as well. This problem has been a subject of extensive applied research during the last few decades. However, almost all known algorithms for this problem are heuristic. As such, they can get stuck in local minima, and do not provide any global guarantees on solution quality. We investigate several variants of the above problem, varying different host and target metrics, and definitions of distortion.(cont.) We present results for different types of distortion: multiplicative versus additive, worst-case versus average-case and several types of target metrics, such as the line, the plane, d-dimensional Euclidean space, ultrametrics, and trees. We also present algorithms for ordinal embeddings and embedding with extra information.by Mihai Bădoiu.Ph.D
Labeled Nearest Neighbor Search and Metric Spanners via Locality Sensitive Orderings
Chan, Har-Peled, and Jones [SICOMP 2020] developed locality-sensitive
orderings (LSO) for Euclidean space. A -LSO is a collection
of orderings such that for every there is an
ordering , where all the points between and w.r.t.
are in the -neighborhood of either or . In essence, LSO
allow one to reduce problems to the -dimensional line. Later, Filtser and Le
[STOC 2022] developed LSO's for doubling metrics, general metric spaces, and
minor free graphs.
For Euclidean and doubling spaces, the number of orderings in the LSO is
exponential in the dimension, which made them mainly useful for the low
dimensional regime. In this paper, we develop new LSO's for Euclidean,
, and doubling spaces that allow us to trade larger stretch for a much
smaller number of orderings. We then use our new LSO's (as well as the previous
ones) to construct path reporting low hop spanners, fault tolerant spanners,
reliable spanners, and light spanners for different metric spaces.
While many nearest neighbor search (NNS) data structures were constructed for
metric spaces with implicit distance representations (where the distance
between two metric points can be computed using their names, e.g. Euclidean
space), for other spaces almost nothing is known. In this paper we initiate the
study of the labeled NNS problem, where one is allowed to artificially assign
labels (short names) to metric points. We use LSO's to construct efficient
labeled NNS data structures in this model
- …