Search CORE

4,660 research outputs found

Edit Distance: Sketching, Streaming and Document Exchange

Author: Belazzougui Djamal
Zhang Qin
Publication venue
Publication date: 14/07/2016
Field of study

We show that in the document exchange problem, where Alice holds

x \in \{0,1\}^n

and Bob holds

y \in \{0,1\}^n

, Alice can send Bob a message of size

O(K(\log^2 K+\log n))

bits such that Bob can recover

x

using the message and his input

y

if the edit distance between

x

and

y

is no more than

K

, and output "error" otherwise. Both the encoding and decoding can be done in time

\tilde{O}(n+\mathsf{poly}(K))

. This result significantly improves the previous communication bounds under polynomial encoding/decoding time. We also show that in the referee model, where Alice and Bob hold

x

and

y

respectively, they can compute sketches of

x

and

y

of sizes

\mathsf{poly}(K \log n)

bits (the encoding), and send to the referee, who can then compute the edit distance between

x

and

y

together with all the edit operations if the edit distance is no more than

K

, and output "error" otherwise (the decoding). To the best of our knowledge, this is the first result for sketching edit distance using

\mathsf{poly}(K \log n)

bits. Moreover, the encoding phase of our sketching algorithm can be performed by scanning the input string in one pass. Thus our sketching algorithm also implies the first streaming algorithm for computing edit distance and all the edits exactly using

\mathsf{poly}(K \log n)

bits of space.Comment: Full version of an article to be presented at the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016

arXiv.org e-Print Archive

Crossref

Bidimensionality of Geometric Intersection Graphs

Author: E. Demaine
E.D. Demaine
E.D. Demaine
F.V. Fomin
F.V. Fomin
H.L. Bodlaender
L. Cai
Q.-P. Gu
Publication venue
Publication date: 28/08/2013
Field of study

Let B be a finite collection of geometric (not necessarily convex) bodies in the plane. Clearly, this class of geometric objects naturally generalizes the class of disks, lines, ellipsoids, and even convex polygons. We consider geometric intersection graphs GB where each body of the collection B is represented by a vertex, and two vertices of GB are adjacent if the intersection of the corresponding bodies is non-empty. For such graph classes and under natural restrictions on their maximum degree or subgraph exclusion, we prove that the relation between their treewidth and the maximum size of a grid minor is linear. These combinatorial results vastly extend the applicability of all the meta-algorithmic results of the bidimensionality theory to geometrically defined graph classes

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

HAL Descartes

Metrics for Graph Comparison: A Practitioner's Guide

Author: Meyer Francois G.
Wills Peter
Publication venue
Publication date: 16/12/2019
Field of study

Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees in data in these fields yields insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances (also known as

\lambda

distances) and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies and different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and empirical datasets. We put forward a multi-scale picture of graph structure, in which the effect of global and local structure upon the distance measures is considered. We make recommendations on the applicability of different distance measures to empirical graph data problem based on this multi-scale view. Finally, we introduce the Python library NetComp which implements the graph distances used in this work

arXiv.org e-Print Archive

Searching for Realizations of Finite Metric Spaces in Tight Spans

Author: Althöfer
Andreas Spillner
Bandelt
Benkert
Buneman
Catusse
Chepoi
Chin
Chrobak
Chung
Cui
Dantzig
Dress
Dress
Dress
Eppstein
Gawrilow
Guo
Hakimi
Herrmann
Hertz
Hertz
Imrich
Isbell
Klee
Knauer
Kuratowski
Sven Herrmann
Varone
Vincent Moulton
Winkler
Publication venue: 'Elsevier BV'
Publication date: 18/09/2013
Field of study

An important problem that commonly arises in areas such as internet traffic-flow analysis, phylogenetics and electrical circuit design, is to find a representation of any given metric

D

on a finite set by an edge-weighted graph, such that the total edge length of the graph is minimum over all such graphs. Such a graph is called an optimal realization and finding such realizations is known to be NP-hard. Recently Varone presented a heuristic greedy algorithm for computing optimal realizations. Here we present an alternative heuristic that exploits the relationship between realizations of the metric

D

and its so-called tight span

T_D

. The tight span

T_D

is a canonical polytopal complex that can be associated to

D

, and our approach explores parts of

T_D

for realizations in a way that is similar to the classical simplex algorithm. We also provide computational results illustrating the performance of our approach for different types of metrics, including

l_1

-distances and two-decomposable metrics for which it is provably possible to find optimal realizations in their tight spans.Comment: 20 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of East Anglia digital repository

Improved ESP-index: a practical self-index for highly repetitive texts

Author: F. Claude
F. Claude
G. Navarro
J. Barbay
J.I. Munro
K. Goto
O. Delpratt
S. Maruyama
T. Gagie
T. Gagie
T. Yamamoto
Publication venue
Publication date: 01/01/2014
Field of study

While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th International Symposium on Experimental Algorithms (SEA2014

arXiv.org e-Print Archive

Crossref