1,123 research outputs found
Finger Search in Grammar-Compressed Strings
Grammar-based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. Given a grammar, the
random access problem is to compactly represent the grammar while supporting
random access, that is, given a position in the original uncompressed string
report the character at that position. In this paper we study the random access
problem with the finger search property, that is, the time for a random access
query should depend on the distance between a specified index , called the
\emph{finger}, and the query index . We consider both a static variant,
where we first place a finger and subsequently access indices near the finger
efficiently, and a dynamic variant where also moving the finger such that the
time depends on the distance moved is supported.
Let be the size the grammar, and let be the size of the string. For
the static variant we give a linear space representation that supports placing
the finger in time and subsequently accessing in time,
where is the distance between the finger and the accessed index. For the
dynamic variant we give a linear space representation that supports placing the
finger in time and accessing and moving the finger in time. Compared to the best linear space solution to random
access, we improve a query bound to for the static
variant and to for the dynamic variant, while
maintaining linear space. As an application of our results we obtain an
improved solution to the longest common extension problem in grammar compressed
strings. To obtain our results, we introduce several new techniques of
independent interest, including a novel van Emde Boas style decomposition of
grammars
Fast Dynamic Arrays
We present a highly optimized implementation of tiered vectors, a data
structure for maintaining a sequence of elements supporting access in time
and insertion and deletion in time for
while using extra space. We consider several different implementation
optimizations in C++ and compare their performance to that of vector and
multiset from the standard library on sequences with up to elements. Our
fastest implementation uses much less space than multiset while providing
speedups of for access operations compared to multiset and speedups
of compared to vector for insertion and deletion operations
while being competitive with both data structures for all other operations
Fast Dynamic Arrays
We present a highly optimized implementation of tiered vectors, a data structure for maintaining a sequence of n elements supporting access in time O(1) and insertion and deletion in time O(n^e) for e > 0 while using o(n) extra space. We consider several different implementation optimizations in C++ and compare their performance to that of vector and set from the standard library on sequences with up to 10^8 elements. Our fastest implementation uses much less space than set while providing speedups of 40x for access operations compared to set and speedups of 10.000x compared to vector for insertion and deletion operations while being competitive with both data structures for all other operations
Compressed Indexing with Signature Grammars
The compressed indexing problem is to preprocess a string of length
into a compressed representation that supports pattern matching queries. That
is, given a string of length report all occurrences of in .
We present a data structure that supports pattern matching queries in time using space where
is the size of the LZ77 parse of and is an arbitrarily small
constant, when the alphabet is small or for any
constant . We also present two data structures for the general
case; one where the space is increased by , and one where the
query time changes from worst-case to expected. These results improve the
previously best known solutions. Notably, this is the first data structure that
decides if occurs in in time using space.
Our results are mainly obtained by a novel combination of a randomized
grammar construction algorithm with well known techniques relating pattern
matching to 2D-range reporting
Optimal-Time Dictionary-Compressed Indexes
We describe the first self-indexes able to count and locate pattern
occurrences in optimal time within a space bounded by the size of the most
popular dictionary compressors. To achieve this result we combine several
recent findings, including \emph{string attractors} --- new combinatorial
objects encompassing most known compressibility measures for highly repetitive
texts ---, and grammars based on \emph{locally-consistent parsing}.
More in detail, let be the size of the smallest attractor for a text
of length . The measure is an (asymptotic) lower bound to the
size of dictionary compressors based on Lempel--Ziv, context-free grammars, and
many others. The smallest known text representations in terms of attractors use
space , and our lightest indexes work within the same
asymptotic space. Let be a suitably small constant fixed at
construction time, be the pattern length, and be the number of its
text occurrences. Our index counts pattern occurrences in
time, and locates them in time. These times already outperform those of most dictionary-compressed
indexes, while obtaining the least asymptotic space for any index searching
within time. Further, by increasing the space
to , we reduce the locating time to the
optimal , and within space we can
also count in optimal time. No dictionary-compressed index had obtained
this time before. All our indexes can be constructed in space and
expected time.
As a byproduct of independent interest..
Active megadetachment beneath the western United States
Geodetic data, interpreted in light of seismic imaging, seismicity, xenolith studies, and the late Quaternary geologic history of the northern Great Basin, suggest that a subcontinental-scale extensional detachment is localized near the Moho. To first order, seismic yielding in the upper crust at any given latitude in this region occurs via an M7 earthquake every 100 years. Here we develop the hypothesis that since 1996, the region has undergone a cycle of strain accumulation and release similar to “slow slip events” observed on subduction megathrusts, but yielding occurred on a subhorizontal surface 5–10 times larger in the slip direction, and at temperatures >800°C. Net slip was variable, ranging from 5 to 10 mm over most of the region. Strain energy with moment magnitude equivalent to an M7 earthquake was released along this “megadetachment,” primarily between 2000.0 and 2005.5. Slip initiated in late 1998 to mid-1999 in northeastern Nevada and is best expressed in late 2003 during a magma injection event at Moho depth beneath the Sierra Nevada, accompanied by more rapid eastward relative displacement across the entire region. The event ended in the east at 2004.0 and in the remainder of the network at about 2005.5. Strain energy thus appears to have been transmitted from the Cordilleran interior toward the plate boundary, from high gravitational potential to low, via yielding on the megadetachment. The size and kinematic function of the proposed structure, in light of various proxies for lithospheric thickness, imply that the subcrustal lithosphere beneath Nevada is a strong, thin plate, even though it resides in a high heat flow tectonic regime. A strong lowermost crust and upper mantle is consistent with patterns of postseismic relaxation in the southern Great Basin, deformation microstructures and low water content in dunite xenoliths in young lavas in central Nevada, and high-temperature microstructures in analog surface exposures of deformed lower crust. Large-scale decoupling between crust and upper mantle is consistent with the broad distribution of strain in the upper crust versus the more localized distribution in the subcrustal lithosphere, as inferred by such proxies as low P wave velocity and mafic magmatism
- …