852 research outputs found
Optimal Substring-Equality Queries with Applications to Sparse Text Indexing
We consider the problem of encoding a string of length from an integer
alphabet of size so that access and substring equality queries (that
is, determining the equality of any two substrings) can be answered
efficiently. Any uniquely-decodable encoding supporting access must take
bits. We describe a new data
structure matching this lower bound when while supporting
both queries in optimal time. Furthermore, we show that the string can
be overwritten in-place with this structure. The redundancy of
bits and the constant query time break exponentially a lower bound that is
known to hold in the read-only model. Using our new string representation, we
obtain the first in-place subquadratic (indeed, even sublinear in some cases)
algorithms for several string-processing problems in the restore model: the
input string is rewritable and must be restored before the computation
terminates. In particular, we describe the first in-place subquadratic Monte
Carlo solutions to the sparse suffix sorting, sparse LCP array construction,
and suffix selection problems. With the sole exception of suffix selection, our
algorithms are also the first running in sublinear time for small enough sets
of input suffixes. Combining these solutions, we obtain the first
sublinear-time Monte Carlo algorithm for building the sparse suffix tree in
compact space. We also show how to derandomize our algorithms using small
space. This leads to the first Las Vegas in-place algorithm computing the full
LCP array in time and to the first Las Vegas in-place algorithms
solving the sparse suffix sorting and sparse LCP array construction problems in
time. Running times of these Las Vegas
algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las
Vegas algorithm
Fast Scalable Construction of (Minimal Perfect Hash) Functions
Recent advances in random linear systems on finite fields have paved the way
for the construction of constant-time data structures representing static
functions and minimal perfect hash functions using less space with respect to
existing techniques. The main obstruction for any practical application of
these results is the cubic-time Gaussian elimination required to solve these
linear systems: despite they can be made very small, the computation is still
too slow to be feasible.
In this paper we describe in detail a number of heuristics and programming
techniques to speed up the resolution of these systems by several orders of
magnitude, making the overall construction competitive with the standard and
widely used MWHC technique, which is based on hypergraph peeling. In
particular, we introduce broadword programming techniques for fast equation
manipulation and a lazy Gaussian elimination algorithm. We also describe a
number of technical improvements to the data structure which further reduce
space usage and improve lookup speed.
Our implementation of these techniques yields a minimal perfect hash function
data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based
ones, and a static function data structure which reduces the multiplicative
overhead from 1.23 to 1.03
Recommended from our members
New Applications of the Nearest-Neighbor Chain Algorithm
The nearest-neighbor chain algorithm was proposed in the eighties as a way to speed up certain hierarchical clustering algorithms. In the first part of the dissertation, we show that its application is not limited to clustering. We apply it to a variety of geometric and combinatorial problems. In each case, we show that the nearest-neighbor chain algorithm finds the same solution as a preexistent greedy algorithm, but often with an improved runtime. We obtain speedups over greedy algorithms for Euclidean TSP, Steiner TSP in planar graphs, straight skeletons, a geometric coverage problem, and three stable matching models. In the second part, we study the stable-matching Voronoi diagram, a type of plane partition which combines properties of stable matchings and Voronoi diagrams. We propose political redistricting as an application. We also show that it is impossible to compute this diagram in an algebraic model of computation, and give three algorithmic approaches to overcome this obstacle. One of them is based on the nearest-neighbor chain algorithm, linking the two parts together
The resolved star-formation relation in nearby active galactic nuclei
We present an analysis of the relation between star formation rate (SFR)
surface density (sigmasfr) and mass surface density of molecular gas
(sigmahtwo), commonly referred to as the Kennicutt-Schmidt (K-S) relation, at
its intrinsic spatial scale, i.e. the size of giant molecular clouds (10-150
pc), in the central, high-density regions of four nearby low-luminosity active
galactic nuclei (AGN). We used interferometric IRAM CO(1-0) and CO(2-1), and
SMA CO(3-2) emission line maps to derive sigmahtwo and HST-Halpha images to
estimate sigmasfr. Each galaxy is characterized by a distinct molecular SF
relation at spatial scales between 20 to 200 pc. The K-S relations can be
sub-linear, but also super-linear, with slopes ranging from 0.5 to 1.3.
Depletion times range from 1 and 2Gyr, compatible with results for nearby
normal galaxies. These findings are valid independently of which transition,
CO(1-0), CO(2-1), or CO(3-2), is used to derive sigmahtwo. Because of
star-formation feedback, life-time of clouds, turbulent cascade, or magnetic
fields, the K-S relation might be expected to degrade on small spatial scales
(<100 pc). However, we find no clear evidence for this, even on scales as small
as 20 pc, and this might be because of the higher density of GMCs in galaxy
centers which have to resist higher shear forces. The proportionality between
sigmahtwo and sigmasfr found between 10 and 100 Msun/pc2 is valid even at high
densities, 10^3 Msun/pc2. However, by adopting a common CO-to-H2 conversion
factor (alpha_CO), the central regions of the galaxies have higher sigmasfr for
a given gas column than those expected from the models, with a behavior that
lies between the mergers/high-redshift starburst systems and the more quiescent
star-forming galaxies, assuming that the first ones require a lower value of
alpha_CO.Comment: 22 pages, 8 figures, Accepted for publication in Astronomy and
Astrophysic
- …