101 research outputs found

    Learning loopy graphical models with latent variables: Efficient methods and guarantees

    Get PDF
    The problem of structure estimation in graphical models with latent variables is considered. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider models where the underlying Markov graph is locally tree-like, and the model is in the regime of correlation decay. For the special case of the Ising model, the number of samples nn required for structural consistency of our method scales as n=Ω(θminδη(η+1)2logp)n=\Omega(\theta_{\min}^{-\delta\eta(\eta+1)-2}\log p), where p is the number of variables, θmin\theta_{\min} is the minimum edge potential, δ\delta is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η\eta is a parameter which depends on the bounds on node and edge potentials in the Ising model. Necessary conditions for structural consistency under any algorithm are derived and our method nearly matches the lower bound on sample requirements. Further, the proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Parallel Algorithms for Geometric Graph Problems

    Full text link
    We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1+ϵ)(1+\epsilon)-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n1+oϵ(1)n^{1+o_\epsilon(1)}. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for (1+ϵ)(1+\epsilon)-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a (1+ϵ)(1+\epsilon)-approximation algorithm with nδn^{\delta} space in the streaming-with-sorting model with 1/δO(1)1/\delta^{O(1)} passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem

    Network Design with Coverage Costs

    Get PDF
    We study network design with a cost structure motivated by redundancy in data traffic. We are given a graph, g groups of terminals, and a universe of data packets. Each group of terminals desires a subset of the packets from its respective source. The cost of routing traffic on any edge in the network is proportional to the total size of the distinct packets that the edge carries. Our goal is to find a minimum cost routing. We focus on two settings. In the first, the collection of packet sets desired by source-sink pairs is laminar. For this setting, we present a primal-dual based 2-approximation, improving upon a logarithmic approximation due to Barman and Chawla (2012). In the second setting, packet sets can have non-trivial intersection. We focus on the case where each packet is desired by either a single terminal group or by all of the groups, and the graph is unweighted. For this setting we present an O(log g)-approximation. Our approximation for the second setting is based on a novel spanner-type construction in unweighted graphs that, given a collection of g vertex subsets, finds a subgraph of cost only a constant factor more than the minimum spanning tree of the graph, such that every subset in the collection has a Steiner tree in the subgraph of cost at most O(log g) that of its minimum Steiner tree in the original graph. We call such a subgraph a group spanner.Comment: Updated version with additional result

    Algorithmic embeddings

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 233-242).We present several computationally efficient algorithms, and complexity results on low distortion mappings between metric spaces. An embedding between two metric spaces is a mapping between the two metric spaces and the distortion of the embedding is the factor by which the distances change. We have pioneered theoretical work on relative (or approximation) version of this problem. In this setting, the question is the following: for the class of metrics C, and a host metric M', what is the smallest approximation factor a > 1 of an efficient algorithm minimizing the distortion of an embedding of a given input metric M E C into M'? This formulation enables the algorithm to adapt to a given input metric. In particular, if the host metric is "expressive enough" to accurately model the input distances, the minimum achievable distortion is low, and the algorithm will produce an embedding with low distortion as well. This problem has been a subject of extensive applied research during the last few decades. However, almost all known algorithms for this problem are heuristic. As such, they can get stuck in local minima, and do not provide any global guarantees on solution quality. We investigate several variants of the above problem, varying different host and target metrics, and definitions of distortion.(cont.) We present results for different types of distortion: multiplicative versus additive, worst-case versus average-case and several types of target metrics, such as the line, the plane, d-dimensional Euclidean space, ultrametrics, and trees. We also present algorithms for ordinal embeddings and embedding with extra information.by Mihai Bădoiu.Ph.D

    Labeled Nearest Neighbor Search and Metric Spanners via Locality Sensitive Orderings

    Get PDF
    Chan, Har-Peled, and Jones [SICOMP 2020] developed locality-sensitive orderings (LSO) for Euclidean space. A (τ,ρ)(\tau,\rho)-LSO is a collection Σ\Sigma of orderings such that for every x,yRdx,y\in\mathbb{R}^d there is an ordering σΣ\sigma\in\Sigma, where all the points between xx and yy w.r.t. σ\sigma are in the ρ\rho-neighborhood of either xx or yy. In essence, LSO allow one to reduce problems to the 11-dimensional line. Later, Filtser and Le [STOC 2022] developed LSO's for doubling metrics, general metric spaces, and minor free graphs. For Euclidean and doubling spaces, the number of orderings in the LSO is exponential in the dimension, which made them mainly useful for the low dimensional regime. In this paper, we develop new LSO's for Euclidean, p\ell_p, and doubling spaces that allow us to trade larger stretch for a much smaller number of orderings. We then use our new LSO's (as well as the previous ones) to construct path reporting low hop spanners, fault tolerant spanners, reliable spanners, and light spanners for different metric spaces. While many nearest neighbor search (NNS) data structures were constructed for metric spaces with implicit distance representations (where the distance between two metric points can be computed using their names, e.g. Euclidean space), for other spaces almost nothing is known. In this paper we initiate the study of the labeled NNS problem, where one is allowed to artificially assign labels (short names) to metric points. We use LSO's to construct efficient labeled NNS data structures in this model
    corecore