3 research outputs found
String Indexing for Patterns with Wildcards
We consider the problem of indexing a string of length to report the
occurrences of a query pattern containing characters and wildcards.
Let be the number of occurrences of in , and the size of
the alphabet. We obtain the following results.
- A linear space index with query time .
This significantly improves the previously best known linear space index by Lam
et al. [ISAAC 2007], which requires query time in the worst case.
- An index with query time using space , where is the maximum number of wildcards allowed in the pattern.
This is the first non-trivial bound with this query time.
- A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].
We also show that these indexes can be generalized to allow variable length
gaps in the pattern. Our results are obtained using a novel combination of
well-known and new techniques, which could be of independent interest
A.: Dotted suffix trees: a structure for approximate text indexing
Abstract. In this work, we address is text indexing for approximate matching. Given a text T which undergoes some preprocessing to generate an index, we can later query this index to identify the places where a string occurs up to a certain number of errors k (edition distance). The indexing structure occupies space O(n log k n) in the average case, independent of alphabet size. This structure can be used to report the existence of a match with k errors in O(3 k m k+1) and to report the occurrences in O(3 k m k+1 + ed) time, where m is the length of the pattern and where ed the number of matching edit scripts. The construction of the structure has time bound by O(kN|Σ|), where N is the number of nodes in the index and |Σ | the alphabet size