44,971 research outputs found
Pattern avoidance in binary trees
This paper considers the enumeration of trees avoiding a contiguous pattern.
We provide an algorithm for computing the generating function that counts
n-leaf binary trees avoiding a given binary tree pattern t. Equipped with this
counting mechanism, we study the analogue of Wilf equivalence in which two tree
patterns are equivalent if the respective n-leaf trees that avoid them are
equinumerous. We investigate the equivalence classes combinatorially. Toward
establishing bijective proofs of tree pattern equivalence, we develop a general
method of restructuring trees that conjecturally succeeds to produce an
explicit bijection for each pair of equivalent tree patterns.Comment: 19 pages, many images; published versio
The complexity of counting poset and permutation patterns
We introduce a notion of pattern occurrence that generalizes both classical
permutation patterns as well as poset containment. Many questions about pattern
statistics and avoidance generalize naturally to this setting, and we focus on
functional complexity problems -- particularly those that arise by constraining
the order dimensions of the pattern and text posets. We show that counting the
number of induced, injective occurrences among dimension 2 posets is #P-hard;
enumerating the linear extensions that occur in realizers of dimension 2 posets
can be done in polynomial time, while for unconstrained dimension it is
GI-complete; counting not necessarily induced, injective occurrences among
dimension 2 posets is #P-hard; counting injective or not necessarily injective
occurrences of an arbitrary pattern in a dimension 1 text is #P-hard, although
it is in FP if the pattern poset is constrained to have bounded intrinsic
width; and counting injective occurrences of a dimension 1 pattern in an
arbitrary text is #P-hard, while it is in FP for bounded dimension texts. This
framework easily leads to a number of open questions, chief among which are (1)
is it #P-hard to count the number of occurrences of a dimension 2 pattern in a
dimension 1 text, and (2) is it #P-hard to count the number of texts which
avoid a given pattern?Comment: 15 page
Internal Pattern Matching Queries in a Text and Applications
We consider several types of internal queries: questions about subwords of a
text. As the main tool we develop an optimal data structure for the problem
called here internal pattern matching. This data structure provides
constant-time answers to queries about occurrences of one subword in
another subword of a given text, assuming that ,
which allows for a constant-space representation of all occurrences. This
problem can be viewed as a natural extension of the well-studied pattern
matching problem. The data structure has linear size and admits a linear-time
construction algorithm.
Using the solution to the internal pattern matching problem, we obtain very
efficient data structures answering queries about: primitivity of subwords,
periods of subwords, general substring compression, and cyclic equivalence of
two subwords. All these results improve upon the best previously known
counterparts. The linear construction time of our data structure also allows to
improve the algorithm for finding -subrepetitions in a text (a more
general version of maximal repetitions, also called runs). For any fixed
we obtain the first linear-time algorithm, which matches the linear
time complexity of the algorithm computing runs. Our data structure has already
been used as a part of the efficient solutions for subword suffix rank &
selection, as well as substring compression using Burrows-Wheeler transform
composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201
Reverse-Safe Data Structures for Text Indexing
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model
Universal Compressed Text Indexing
The rise of repetitive datasets has lately generated a lot of interest in
compressed self-indexes based on dictionary compression, a rich and
heterogeneous family that exploits text repetitions in different ways. For each
such compression scheme, several different indexing solutions have been
proposed in the last two decades. To date, the fastest indexes for repetitive
texts are based on the run-length compressed Burrows-Wheeler transform and on
the Compact Directed Acyclic Word Graph. The most space-efficient indexes, on
the other hand, are based on the Lempel-Ziv parsing and on grammar compression.
Indexes for more universal schemes such as collage systems and macro schemes
have not yet been proposed. Very recently, Kempa and Prezza [STOC 2018] showed
that all dictionary compressors can be interpreted as approximation algorithms
for the smallest string attractor, that is, a set of text positions capturing
all distinct substrings. Starting from this observation, in this paper we
develop the first universal compressed self-index, that is, the first indexing
data structure based on string attractors, which can therefore be built on top
of any dictionary-compressed text representation. Let be the size of a
string attractor for a text of length . Our index takes
words of space and supports locating the
occurrences of any pattern of length in
time, for any constant . This is, in particular, the first index
for general macro schemes and collage systems. Our result shows that the
relation between indexing and compression is much deeper than what was
previously thought: the simple property standing at the core of all dictionary
compressors is sufficient to support fast indexed queries.Comment: Fixed with reviewer's comment
- …