35,089 research outputs found
Local Access to Huge Random Objects Through Partial Sampling
© Amartya Shankha Biswas, Ronitt Rubinfeld, and Anak Yodpinyanee. Consider an algorithm performing a computation on a huge random object (for example a random graph or a âlongâ random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally âon-the-flyâ (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-RĂ©nyi G(n, p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinbergâs Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree â. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n, p) model). The distribution over valid colorings is instead specified via a âhugeâ input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q â„ 9â, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself
Weighted Min-Cut: Sequential, Cut-Query and Streaming Algorithms
Consider the following 2-respecting min-cut problem. Given a weighted graph
and its spanning tree , find the minimum cut among the cuts that contain
at most two edges in . This problem is an important subroutine in Karger's
celebrated randomized near-linear-time min-cut algorithm [STOC'96]. We present
a new approach for this problem which can be easily implemented in many
settings, leading to the following randomized min-cut algorithms for weighted
graphs.
* An -time sequential algorithm:
This improves Karger's and bounds when the input graph is not extremely
sparse or dense. Improvements over Karger's bounds were previously known only
under a rather strong assumption that the input graph is simple [Henzinger et
al. SODA'17; Ghaffari et al. SODA'20]. For unweighted graphs with parallel
edges, our bound can be improved to .
* An algorithm requiring cut queries to compute the min-cut of
a weighted graph: This answers an open problem by Rubinstein et al. ITCS'18,
who obtained a similar bound for simple graphs.
* A streaming algorithm that requires space and
passes to compute the min-cut: The only previous non-trivial exact min-cut
algorithm in this setting is the 2-pass -space algorithm on simple
graphs [Rubinstein et al., ITCS'18] (observed by Assadi et al. STOC'19).
In contrast to Karger's 2-respecting min-cut algorithm which deploys
sophisticated dynamic programming techniques, our approach exploits some cute
structural properties so that it only needs to compute the values of cuts corresponding to removing pairs of tree edges, an
operation that can be done quickly in many settings.Comment: Updates on this version: (1) Minor corrections in Section 5.1, 5.2;
(2) Reference to newer results by GMW SOSA21 (arXiv:2008.02060v2), DEMN
STOC21 (arXiv:2004.09129v2) and LMN 21 (arXiv:2102.06565v1
Inference of Ancestral Recombination Graphs through Topological Data Analysis
The recent explosion of genomic data has underscored the need for
interpretable and comprehensive analyses that can capture complex phylogenetic
relationships within and across species. Recombination, reassortment and
horizontal gene transfer constitute examples of pervasive biological phenomena
that cannot be captured by tree-like representations. Starting from hundreds of
genomes, we are interested in the reconstruction of potential evolutionary
histories leading to the observed data. Ancestral recombination graphs
represent potential histories that explicitly accommodate recombination and
mutation events across orthologous genomes. However, they are computationally
costly to reconstruct, usually being infeasible for more than few tens of
genomes. Recently, Topological Data Analysis (TDA) methods have been proposed
as robust and scalable methods that can capture the genetic scale and frequency
of recombination. We build upon previous TDA developments for detecting and
quantifying recombination, and present a novel framework that can be applied to
hundreds of genomes and can be interpreted in terms of minimal histories of
mutation and recombination events, quantifying the scales and identifying the
genomic locations of recombinations. We implement this framework in a software
package, called TARGet, and apply it to several examples, including small
migration between different populations, human recombination, and horizontal
evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and
example files used in the manuscript can be obtained from
https://github.com/RabadanLab/TARGe
Unit Grid Intersection Graphs: Recognition and Properties
It has been known since 1991 that the problem of recognizing grid
intersection graphs is NP-complete. Here we use a modified argument of the
above result to show that even if we restrict to the class of unit grid
intersection graphs (UGIGs), the recognition remains hard, as well as for all
graph classes contained inbetween. The result holds even when considering only
graphs with arbitrarily large girth. Furthermore, we ask the question of
representing UGIGs on grids of minimal size. We show that the UGIGs that can be
represented in a square of side length 1+epsilon, for a positive epsilon no
greater than 1, are exactly the orthogonal ray graphs, and that there exist
families of trees that need an arbitrarily large grid
Theoretical Foundations of Autoregressive Models for Time Series on Acyclic Directed Graphs
Three classes of models for time series on acyclic directed graphs are considered. At first a review of tree-structured models constructed from a nested partitioning of the observation interval is given. This nested partitioning leads to several resolution scales. The concept of mass balance allowing to interpret the average over an interval as the sum of averages over the sub-intervals implies linear restrictions in the tree-structured model. Under a white noise assumption for transition and observation noise there is an change-of-resolution Kalman filter for linear least squares prediction of interval averages \shortcite{chou:1991}. This class of models is generalized by modeling transition noise on the same scale in linear state space form. The third class deals with models on a more general class of directed acyclic graphs where nodes are allowed to have two parents. We show that these models have a linear state space representation with white system and coloured observation noise
Universal Communication, Universal Graphs, and Graph Labeling
We introduce a communication model called universal SMP, in which Alice and Bob receive a function f belonging to a family ?, and inputs x and y. Alice and Bob use shared randomness to send a message to a third party who cannot see f, x, y, or the shared randomness, and must decide f(x,y). Our main application of universal SMP is to relate communication complexity to graph labeling, where the goal is to give a short label to each vertex in a graph, so that adjacency or other functions of two vertices x and y can be determined from the labels ?(x), ?(y). We give a universal SMP protocol using O(k^2) bits of communication for deciding whether two vertices have distance at most k in distributive lattices (generalizing the k-Hamming Distance problem in communication complexity), and explain how this implies a O(k^2 log n) labeling scheme for deciding dist(x,y) ? k on distributive lattices with size n; in contrast, we show that a universal SMP protocol for determining dist(x,y) ? 2 in modular lattices (a superset of distributive lattices) has super-constant ?(n^{1/4}) communication cost. On the other hand, we demonstrate that many graph families known to have efficient adjacency labeling schemes, such as trees, low-arboricity graphs, and planar graphs, admit constant-cost communication protocols for adjacency. Trees also have an O(k) protocol for deciding dist(x,y) ? k and planar graphs have an O(1) protocol for dist(x,y) ? 2, which implies a new O(log n) labeling scheme for the same problem on planar graphs
- âŠ