35,089 research outputs found

    Local Access to Huge Random Objects Through Partial Sampling

    Get PDF
    © Amartya Shankha Biswas, Ronitt Rubinfeld, and Anak Yodpinyanee. Consider an algorithm performing a computation on a huge random object (for example a random graph or a “long” random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally “on-the-fly” (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-RĂ©nyi G(n, p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree ∆. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n, p) model). The distribution over valid colorings is instead specified via a “huge” input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q ≄ 9∆, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself

    Weighted Min-Cut: Sequential, Cut-Query and Streaming Algorithms

    Get PDF
    Consider the following 2-respecting min-cut problem. Given a weighted graph GG and its spanning tree TT, find the minimum cut among the cuts that contain at most two edges in TT. This problem is an important subroutine in Karger's celebrated randomized near-linear-time min-cut algorithm [STOC'96]. We present a new approach for this problem which can be easily implemented in many settings, leading to the following randomized min-cut algorithms for weighted graphs. * An O(mlog⁥2nlog⁥log⁥n+nlog⁥6n)O(m\frac{\log^2 n}{\log\log n} + n\log^6 n)-time sequential algorithm: This improves Karger's O(mlog⁥3n)O(m \log^3 n) and O(m(log⁥2n)log⁥(n2/m)log⁥log⁥n+nlog⁥6n)O(m\frac{(\log^2 n)\log (n^2/m)}{\log\log n} + n\log^6 n) bounds when the input graph is not extremely sparse or dense. Improvements over Karger's bounds were previously known only under a rather strong assumption that the input graph is simple [Henzinger et al. SODA'17; Ghaffari et al. SODA'20]. For unweighted graphs with parallel edges, our bound can be improved to O(mlog⁥1.5nlog⁥log⁥n+nlog⁥6n)O(m\frac{\log^{1.5} n}{\log\log n} + n\log^6 n). * An algorithm requiring O~(n)\tilde O(n) cut queries to compute the min-cut of a weighted graph: This answers an open problem by Rubinstein et al. ITCS'18, who obtained a similar bound for simple graphs. * A streaming algorithm that requires O~(n)\tilde O(n) space and O(log⁥n)O(\log n) passes to compute the min-cut: The only previous non-trivial exact min-cut algorithm in this setting is the 2-pass O~(n)\tilde O(n)-space algorithm on simple graphs [Rubinstein et al., ITCS'18] (observed by Assadi et al. STOC'19). In contrast to Karger's 2-respecting min-cut algorithm which deploys sophisticated dynamic programming techniques, our approach exploits some cute structural properties so that it only needs to compute the values of O~(n)\tilde O(n) cuts corresponding to removing O~(n)\tilde O(n) pairs of tree edges, an operation that can be done quickly in many settings.Comment: Updates on this version: (1) Minor corrections in Section 5.1, 5.2; (2) Reference to newer results by GMW SOSA21 (arXiv:2008.02060v2), DEMN STOC21 (arXiv:2004.09129v2) and LMN 21 (arXiv:2102.06565v1

    Inference of Ancestral Recombination Graphs through Topological Data Analysis

    Get PDF
    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and example files used in the manuscript can be obtained from https://github.com/RabadanLab/TARGe

    Unit Grid Intersection Graphs: Recognition and Properties

    Full text link
    It has been known since 1991 that the problem of recognizing grid intersection graphs is NP-complete. Here we use a modified argument of the above result to show that even if we restrict to the class of unit grid intersection graphs (UGIGs), the recognition remains hard, as well as for all graph classes contained inbetween. The result holds even when considering only graphs with arbitrarily large girth. Furthermore, we ask the question of representing UGIGs on grids of minimal size. We show that the UGIGs that can be represented in a square of side length 1+epsilon, for a positive epsilon no greater than 1, are exactly the orthogonal ray graphs, and that there exist families of trees that need an arbitrarily large grid

    Theoretical Foundations of Autoregressive Models for Time Series on Acyclic Directed Graphs

    Get PDF
    Three classes of models for time series on acyclic directed graphs are considered. At first a review of tree-structured models constructed from a nested partitioning of the observation interval is given. This nested partitioning leads to several resolution scales. The concept of mass balance allowing to interpret the average over an interval as the sum of averages over the sub-intervals implies linear restrictions in the tree-structured model. Under a white noise assumption for transition and observation noise there is an change-of-resolution Kalman filter for linear least squares prediction of interval averages \shortcite{chou:1991}. This class of models is generalized by modeling transition noise on the same scale in linear state space form. The third class deals with models on a more general class of directed acyclic graphs where nodes are allowed to have two parents. We show that these models have a linear state space representation with white system and coloured observation noise

    Universal Communication, Universal Graphs, and Graph Labeling

    Get PDF
    We introduce a communication model called universal SMP, in which Alice and Bob receive a function f belonging to a family ?, and inputs x and y. Alice and Bob use shared randomness to send a message to a third party who cannot see f, x, y, or the shared randomness, and must decide f(x,y). Our main application of universal SMP is to relate communication complexity to graph labeling, where the goal is to give a short label to each vertex in a graph, so that adjacency or other functions of two vertices x and y can be determined from the labels ?(x), ?(y). We give a universal SMP protocol using O(k^2) bits of communication for deciding whether two vertices have distance at most k in distributive lattices (generalizing the k-Hamming Distance problem in communication complexity), and explain how this implies a O(k^2 log n) labeling scheme for deciding dist(x,y) ? k on distributive lattices with size n; in contrast, we show that a universal SMP protocol for determining dist(x,y) ? 2 in modular lattices (a superset of distributive lattices) has super-constant ?(n^{1/4}) communication cost. On the other hand, we demonstrate that many graph families known to have efficient adjacency labeling schemes, such as trees, low-arboricity graphs, and planar graphs, admit constant-cost communication protocols for adjacency. Trees also have an O(k) protocol for deciding dist(x,y) ? k and planar graphs have an O(1) protocol for dist(x,y) ? 2, which implies a new O(log n) labeling scheme for the same problem on planar graphs
    • 

    corecore