3 research outputs found

    String Indexing for Patterns with Wildcards

    Get PDF
    We consider the problem of indexing a string tt of length nn to report the occurrences of a query pattern pp containing mm characters and jj wildcards. Let occocc be the number of occurrences of pp in tt, and σ\sigma the size of the alphabet. We obtain the following results. - A linear space index with query time O(m+σjloglogn+occ)O(m+\sigma^j \log \log n + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn)\Theta(jn) in the worst case. - An index with query time O(m+j+occ)O(m+j+occ) using space O(σk2nlogklogn)O(\sigma^{k^2} n \log^k \log n), where kk is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

    A.: Dotted suffix trees: a structure for approximate text indexing

    No full text
    Abstract. In this work, we address is text indexing for approximate matching. Given a text T which undergoes some preprocessing to generate an index, we can later query this index to identify the places where a string occurs up to a certain number of errors k (edition distance). The indexing structure occupies space O(n log k n) in the average case, independent of alphabet size. This structure can be used to report the existence of a match with k errors in O(3 k m k+1) and to report the occurrences in O(3 k m k+1 + ed) time, where m is the length of the pattern and where ed the number of matching edit scripts. The construction of the structure has time bound by O(kN|Σ|), where N is the number of nodes in the index and |Σ | the alphabet size
    corecore