42,576 research outputs found

    Efficient Construction of Probabilistic Tree Embeddings

    Get PDF
    In this paper we describe an algorithm that embeds a graph metric (V,dG)(V,d_G) on an undirected weighted graph G=(V,E)G=(V,E) into a distribution of tree metrics (T,DT)(T,D_T) such that for every pair u,vVu,v\in V, dG(u,v)dT(u,v)d_G(u,v)\leq d_T(u,v) and ET[dT(u,v)]O(logn)dG(u,v){\bf{E}}_{T}[d_T(u,v)]\leq O(\log n)\cdot d_G(u,v). Such embeddings have proved highly useful in designing fast approximation algorithms, as many hard problems on graphs are easy to solve on tree instances. For a graph with nn vertices and mm edges, our algorithm runs in O(mlogn)O(m\log n) time with high probability, which improves the previous upper bound of O(mlog3n)O(m\log^3 n) shown by Mendel et al.\,in 2009. The key component of our algorithm is a new approximate single-source shortest-path algorithm, which implements the priority queue with a new data structure, the "bucket-tree structure". The algorithm has three properties: it only requires linear time in the number of edges in the input graph; the computed distances have a distance preserving property; and when computing the shortest-paths to the kk-nearest vertices from the source, it only requires to visit these vertices and their edge lists. These properties are essential to guarantee the correctness and the stated time bound. Using this shortest-path algorithm, we show how to generate an intermediate structure, the approximate dominance sequences of the input graph, in O(mlogn)O(m \log n) time, and further propose a simple yet efficient algorithm to converted this sequence to a tree embedding in O(nlogn)O(n\log n) time, both with high probability. Combining the three subroutines gives the stated time bound of the algorithm. Then we show that this efficient construction can facilitate some applications. We proved that FRT trees (the generated tree embedding) are Ramsey partitions with asymptotically tight bound, so the construction of a series of distance oracles can be accelerated

    Solving weighted and counting variants of connectivity problems parameterized by treewidth deterministically in single exponential time

    Full text link
    It is well known that many local graph problems, like Vertex Cover and Dominating Set, can be solved in 2^{O(tw)}|V|^{O(1)} time for graphs G=(V,E) with a given tree decomposition of width tw. However, for nonlocal problems, like the fundamental class of connectivity problems, for a long time we did not know how to do this faster than tw^{O(tw)}|V|^{O(1)}. Recently, Cygan et al. (FOCS 2011) presented Monte Carlo algorithms for a wide range of connectivity problems running in time $c^{tw}|V|^{O(1)} for a small constant c, e.g., for Hamiltonian Cycle and Steiner tree. Naturally, this raises the question whether randomization is necessary to achieve this runtime; furthermore, it is desirable to also solve counting and weighted versions (the latter without incurring a pseudo-polynomial cost in terms of the weights). We present two new approaches rooted in linear algebra, based on matrix rank and determinants, which provide deterministic c^{tw}|V|^{O(1)} time algorithms, also for weighted and counting versions. For example, in this time we can solve the traveling salesman problem or count the number of Hamiltonian cycles. The rank-based ideas provide a rather general approach for speeding up even straightforward dynamic programming formulations by identifying "small" sets of representative partial solutions; we focus on the case of expressing connectivity via sets of partitions, but the essential ideas should have further applications. The determinant-based approach uses the matrix tree theorem for deriving closed formulas for counting versions of connectivity problems; we show how to evaluate those formulas via dynamic programming.Comment: 36 page

    Efficient regularized isotonic regression with application to gene--gene interaction search

    Full text link
    Isotonic regression is a nonparametric approach for fitting monotonic models to data that has been widely studied from both theoretical and practical perspectives. However, this approach encounters computational and statistical overfitting issues in higher dimensions. To address both concerns, we present an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic regression based on recursively partitioning the covariate space through solution of progressively smaller "best cut" subproblems. This creates a regularized sequence of isotonic models of increasing model complexity that converges to the global isotonic regression solution. The models along the sequence are often more accurate than the unregularized isotonic regression model because of the complexity control they offer. We quantify this complexity control through estimation of degrees of freedom along the path. Success of the regularized models in prediction and IRPs favorable computational properties are demonstrated through a series of simulated and real data experiments. We discuss application of IRP to the problem of searching for gene--gene interactions and epistasis, and demonstrate it on data from genome-wide association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics

    Full text link
    Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines the scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.Comment: Under review by a conference, 201

    On Strong Diameter Padded Decompositions

    Get PDF
    Given a weighted graph G=(V,E,w), a partition of V is Delta-bounded if the diameter of each cluster is bounded by Delta. A distribution over Delta-bounded partitions is a beta-padded decomposition if every ball of radius gamma Delta is contained in a single cluster with probability at least e^{-beta * gamma}. The weak diameter of a cluster C is measured w.r.t. distances in G, while the strong diameter is measured w.r.t. distances in the induced graph G[C]. The decomposition is weak/strong according to the diameter guarantee. Formerly, it was proven that K_r free graphs admit weak decompositions with padding parameter O(r), while for strong decompositions only O(r^2) padding parameter was known. Furthermore, for the case of a graph G, for which the induced shortest path metric d_G has doubling dimension ddim, a weak O(ddim)-padded decomposition was constructed, which is also known to be tight. For the case of strong diameter, nothing was known. We construct strong O(r)-padded decompositions for K_r free graphs, matching the state of the art for weak decompositions. Similarly, for graphs with doubling dimension ddim we construct a strong O(ddim)-padded decomposition, which is also tight. We use this decomposition to construct (O(ddim),O~(ddim))-sparse cover scheme for such graphs. Our new decompositions and cover have implications to approximating unique games, the construction of light and sparse spanners, and for path reporting distance oracles
    corecore