4,660 research outputs found

    Edit Distance: Sketching, Streaming and Document Exchange

    Full text link
    We show that in the document exchange problem, where Alice holds x{0,1}nx \in \{0,1\}^n and Bob holds y{0,1}ny \in \{0,1\}^n, Alice can send Bob a message of size O(K(log2K+logn))O(K(\log^2 K+\log n)) bits such that Bob can recover xx using the message and his input yy if the edit distance between xx and yy is no more than KK, and output "error" otherwise. Both the encoding and decoding can be done in time O~(n+poly(K))\tilde{O}(n+\mathsf{poly}(K)). This result significantly improves the previous communication bounds under polynomial encoding/decoding time. We also show that in the referee model, where Alice and Bob hold xx and yy respectively, they can compute sketches of xx and yy of sizes poly(Klogn)\mathsf{poly}(K \log n) bits (the encoding), and send to the referee, who can then compute the edit distance between xx and yy together with all the edit operations if the edit distance is no more than KK, and output "error" otherwise (the decoding). To the best of our knowledge, this is the first result for sketching edit distance using poly(Klogn)\mathsf{poly}(K \log n) bits. Moreover, the encoding phase of our sketching algorithm can be performed by scanning the input string in one pass. Thus our sketching algorithm also implies the first streaming algorithm for computing edit distance and all the edits exactly using poly(Klogn)\mathsf{poly}(K \log n) bits of space.Comment: Full version of an article to be presented at the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016

    Bidimensionality of Geometric Intersection Graphs

    Full text link
    Let B be a finite collection of geometric (not necessarily convex) bodies in the plane. Clearly, this class of geometric objects naturally generalizes the class of disks, lines, ellipsoids, and even convex polygons. We consider geometric intersection graphs GB where each body of the collection B is represented by a vertex, and two vertices of GB are adjacent if the intersection of the corresponding bodies is non-empty. For such graph classes and under natural restrictions on their maximum degree or subgraph exclusion, we prove that the relation between their treewidth and the maximum size of a grid minor is linear. These combinatorial results vastly extend the applicability of all the meta-algorithmic results of the bidimensionality theory to geometrically defined graph classes

    Metrics for Graph Comparison: A Practitioner's Guide

    Full text link
    Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees in data in these fields yields insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances (also known as λ\lambda distances) and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies and different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and empirical datasets. We put forward a multi-scale picture of graph structure, in which the effect of global and local structure upon the distance measures is considered. We make recommendations on the applicability of different distance measures to empirical graph data problem based on this multi-scale view. Finally, we introduce the Python library NetComp which implements the graph distances used in this work

    Searching for Realizations of Finite Metric Spaces in Tight Spans

    Full text link
    An important problem that commonly arises in areas such as internet traffic-flow analysis, phylogenetics and electrical circuit design, is to find a representation of any given metric DD on a finite set by an edge-weighted graph, such that the total edge length of the graph is minimum over all such graphs. Such a graph is called an optimal realization and finding such realizations is known to be NP-hard. Recently Varone presented a heuristic greedy algorithm for computing optimal realizations. Here we present an alternative heuristic that exploits the relationship between realizations of the metric DD and its so-called tight span TDT_D. The tight span TDT_D is a canonical polytopal complex that can be associated to DD, and our approach explores parts of TDT_D for realizations in a way that is similar to the classical simplex algorithm. We also provide computational results illustrating the performance of our approach for different types of metrics, including l1l_1-distances and two-decomposable metrics for which it is provably possible to find optimal realizations in their tight spans.Comment: 20 pages, 3 figure

    Improved ESP-index: a practical self-index for highly repetitive texts

    Full text link
    While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th International Symposium on Experimental Algorithms (SEA2014
    corecore