4,660 research outputs found
Edit Distance: Sketching, Streaming and Document Exchange
We show that in the document exchange problem, where Alice holds and Bob holds , Alice can send Bob a message of
size bits such that Bob can recover using the
message and his input if the edit distance between and is no more
than , and output "error" otherwise. Both the encoding and decoding can be
done in time . This result significantly
improves the previous communication bounds under polynomial encoding/decoding
time. We also show that in the referee model, where Alice and Bob hold and
respectively, they can compute sketches of and of sizes
bits (the encoding), and send to the referee, who can
then compute the edit distance between and together with all the edit
operations if the edit distance is no more than , and output "error"
otherwise (the decoding). To the best of our knowledge, this is the first
result for sketching edit distance using bits.
Moreover, the encoding phase of our sketching algorithm can be performed by
scanning the input string in one pass. Thus our sketching algorithm also
implies the first streaming algorithm for computing edit distance and all the
edits exactly using bits of space.Comment: Full version of an article to be presented at the 57th Annual IEEE
Symposium on Foundations of Computer Science (FOCS 2016
Bidimensionality of Geometric Intersection Graphs
Let B be a finite collection of geometric (not necessarily convex) bodies in
the plane. Clearly, this class of geometric objects naturally generalizes the
class of disks, lines, ellipsoids, and even convex polygons. We consider
geometric intersection graphs GB where each body of the collection B is
represented by a vertex, and two vertices of GB are adjacent if the
intersection of the corresponding bodies is non-empty. For such graph classes
and under natural restrictions on their maximum degree or subgraph exclusion,
we prove that the relation between their treewidth and the maximum size of a
grid minor is linear. These combinatorial results vastly extend the
applicability of all the meta-algorithmic results of the bidimensionality
theory to geometrically defined graph classes
Metrics for Graph Comparison: A Practitioner's Guide
Comparison of graph structure is a ubiquitous task in data analysis and
machine learning, with diverse applications in fields such as neuroscience,
cyber security, social network analysis, and bioinformatics, among others.
Discovery and comparison of structures such as modular communities, rich clubs,
hubs, and trees in data in these fields yields insight into the generative
mechanisms and functional properties of the graph.
Often, two graphs are compared via a pairwise distance measure, with a small
distance indicating structural similarity and vice versa. Common choices
include spectral distances (also known as distances) and distances
based on node affinities. However, there has of yet been no comparative study
of the efficacy of these distance measures in discerning between common graph
topologies and different structural scales.
In this work, we compare commonly used graph metrics and distance measures,
and demonstrate their ability to discern between common topological features
found in both random graph models and empirical datasets. We put forward a
multi-scale picture of graph structure, in which the effect of global and local
structure upon the distance measures is considered. We make recommendations on
the applicability of different distance measures to empirical graph data
problem based on this multi-scale view. Finally, we introduce the Python
library NetComp which implements the graph distances used in this work
Searching for Realizations of Finite Metric Spaces in Tight Spans
An important problem that commonly arises in areas such as internet
traffic-flow analysis, phylogenetics and electrical circuit design, is to find
a representation of any given metric on a finite set by an edge-weighted
graph, such that the total edge length of the graph is minimum over all such
graphs. Such a graph is called an optimal realization and finding such
realizations is known to be NP-hard. Recently Varone presented a heuristic
greedy algorithm for computing optimal realizations. Here we present an
alternative heuristic that exploits the relationship between realizations of
the metric and its so-called tight span . The tight span is a
canonical polytopal complex that can be associated to , and our approach
explores parts of for realizations in a way that is similar to the
classical simplex algorithm. We also provide computational results illustrating
the performance of our approach for different types of metrics, including
-distances and two-decomposable metrics for which it is provably possible
to find optimal realizations in their tight spans.Comment: 20 pages, 3 figure
Improved ESP-index: a practical self-index for highly repetitive texts
While several self-indexes for highly repetitive texts exist, developing a
practical self-index applicable to real world repetitive texts remains a
challenge. ESP-index is a grammar-based self-index on the notion of
edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees
upper bounds of parsing discrepancies between different appearances of the same
subtexts in a text. Although ESP-index performs efficient top-down searches of
query texts, it has a serious issue on binary searches for finding appearances
of variables for a query text, which resulted in slowing down the query
searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea
behind succinct data structures for large alphabets. While ESP-index-I keeps
the same types of efficiencies as ESP-index about the top-down searches, it
avoid the binary searches using fast rank/select operations. We experimentally
test ESP-index-I on the ability to search query texts and extract subtexts from
real world repetitive texts on a large-scale, and we show that ESP-index-I
performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th
International Symposium on Experimental Algorithms (SEA2014
- …