6 research outputs found
Cartesian Tree Matching and Indexing
We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation
Sufficient Conditions for Efficient Indexing Under Different Matchings
The most important task derived from the massive digital data accumulation in the world, is efficient access to this data, hence the importance of indexing. In the last decade, many different types of matching relations were defined, each requiring an efficient indexing scheme. Cole and Hariharan in a ground breaking paper [Cole and Hariharan, SIAM J. Comput., 33(1):26-42, 2003], formulate sufficient conditions for building an efficient indexing for quasi-suffix collections, collections that behave as suffixes. It was shown that known matchings, including parameterized, 2-D array and order preserving matchings, fit their indexing settings. In this paper, we formulate more basic sufficient conditions based on the order relation derived from the matching relation itself, our conditions are more general than the previously known conditions
Computing Covers under Substring Consistent Equivalence Relations
Covers are a kind of quasiperiodicity in strings. A string is a cover of
another string if any position of is inside some occurrence of in
. The shortest and longest cover arrays of have the lengths of the
shortest and longest covers of each prefix of , respectively. The literature
has proposed linear-time algorithms computing longest and shortest cover arrays
taking border arrays as input. An equivalence relation over strings
is called a substring consistent equivalence relation (SCER) iff
implies (1) and (2) for all . In this paper, we generalize the notion of covers for SCERs and prove
that existing algorithms to compute the shortest cover array and the longest
cover array of a string under the identity relation will work for any SCERs
taking the accordingly generalized border arrays.Comment: 16 page