98 research outputs found
Fast and Compact Regular Expression Matching
We study 4 problems in string matching, namely, regular expression matching,
approximate regular expression matching, string edit distance, and subsequence
indexing, on a standard word RAM model of computation that allows
logarithmic-sized words to be manipulated in constant time. We show how to
improve the space and/or remove a dependency on the alphabet size for each
problem using either an improved tabulation technique of an existing algorithm
or by combining known algorithms in a new way
Dictionaries Revisited
Dictionaries are probably the most well studied class of data structures. A dictionary supports insertions, deletions, membership queries, and usually successor, predecessor, and extract-min. Given their centrality to both the theory and practice of data structures, surprisingly basic questions about them remain unsolved and sometimes even unposed. This talk focuses on questions that arise from the disparity between the way large-scale dictionaries are analyzed and the way they are used in practice
Insertion Sort is O(n log n)
Traditional Insertion Sort runs in O(n^2) time because each insertion takes
O(n) time. When people run Insertion Sort in the physical world, they leave
gaps between items to accelerate insertions. Gaps help in computers as well.
This paper shows that Gapped Insertion Sort has insertion times of O(log n)
with high probability, yielding a total running time of O(n log n) with high
probability.Comment: 6 pages, Latex. In Proceedings of the Third International Conference
on Fun With Algorithms, FUN 200
Tight Bounds for Monotone Minimal Perfect Hashing
The monotone minimal perfect hash function (MMPHF) problem is the following
indexing problem. Given a set of distinct keys from
a universe of size , create a data structure that answers the
following query:
Solutions to the MMPHF problem are in widespread use in both theory and
practice.
The best upper bound known for the problem encodes in bits and performs queries in time. It has been an open problem
to either improve the space upper bound or to show that this somewhat odd
looking bound is tight.
In this paper, we show the latter: specifically that any data structure
(deterministic or randomized) for monotone minimal perfect hashing of any
collection of elements from a universe of size requires expected bits to answer every query correctly.
We achieve our lower bound by defining a graph where the nodes
are the possible inputs and where two nodes are adjacent if
they cannot share the same . The size of is then lower bounded by the
log of the chromatic number of . Finally, we show that the
fractional chromatic number (and hence the chromatic number) of is
lower bounded by
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
We develop a dynamic dictionary data structure for the GPU, supporting fast
insertions and deletions, based on the Log Structured Merge tree (LSM). Our
implementation on an NVIDIA K40c GPU has an average update (insertion or
deletion) rate of 225 M elements/s, 13.5x faster than merging items into a
sorted array. The GPU LSM supports the retrieval operations of lookup, count,
and range query operations with an average rate of 75 M, 32 M and 23 M
queries/s respectively. The trade-off for the dynamic updates is that the
sorted array is almost twice as fast on retrievals. We believe that our GPU LSM
is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International
Parallel and Distributed Processing Symposium (IPDPS'18
Improved Distortion and Spam Resistance for PageRank
For a directed graph , a ranking function, such as PageRank,
provides a way of mapping elements of to non-negative real numbers so that
nodes can be ordered. Brin and Page argued that the stationary distribution,
, of a random walk on is an effective ranking function for queries on
an idealized web graph. However, is not defined for all , and in
particular, it is not defined for the real web graph. Thus, they introduced
PageRank to approximate for graphs with ergodic random walks while
being defined on all graphs.
PageRank is defined as a random walk on a graph, where with probability
, a random out-edge is traversed, and with \emph{reset
probability} the random walk instead restarts at a node selected
using a \emph{reset vector} . Originally, was taken to be
uniform on the nodes, and we call this version UPR.
In this paper, we introduce graph-theoretic notions of quality for ranking
functions, specifically \emph{distortion} and \emph{spam resistance}. We show
that UPR has high distortion and low spam resistance and we show how to select
an that yields low distortion and high spam resistance.Comment: 36 page
- …