1,209 research outputs found

    Compressed Subsequence Matching and Packed Tree Coloring

    Get PDF
    We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size nn compressing a string of size NN and a pattern string of size mm over an alphabet of size σ\sigma, our algorithm uses O(n+nσw)O(n+\frac{n\sigma}{w}) space and O(n+nσw+mlogNlogwocc)O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ) or O(n+nσwlogw+mlogNocc)O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ) time. Here ww is the word size and occocc is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for occ=o(nlogN)occ=o(\frac{n}{\log N}) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1

    Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package

    Get PDF
    Dynamic time warping is a popular technique for comparing time series, providing both a distance measure that is insensitive to local compression and stretches and the warping which optimally deforms one of the two input series onto the other. A variety of algorithms and constraints have been discussed in the literature. The dtw package provides an unification of them; it allows R users to compute time series alignments mixing freely a variety of continuity constraints, restriction windows, endpoints, local distance definitions, and so on. The package also provides functions for visualizing alignments and constraints using several classic diagram types.

    Multiple serial episode matching

    Get PDF
    12In a previous paper we generalized the Knuth-Morris-Pratt (KMP) pattern matching algorithm and defined a non-conventional kind of RAM, the MP--RAMs (RAMS equipped with extra operations), and designed an O(n)O(n) on-line algorithm for solving the serial episode matching problem on MP--RAMs when there is only one single episode. We here give two extensions of this algorithm to the case when we search for several patterns simultaneously and compare them. More preciseley, given q+1q+1 strings (a text tt of length nn and qq patterns m1,,mqm_1,\ldots,m_q) and a natural number ww, the {\em multiple serial episode matching problem} consists in finding the number of size ww windows of text tt which contain patterns m1,,mqm_1,\ldots,m_q as subsequences, i.e. for each mim_i, if mi=p1,,pkm_i=p_1,\ldots ,p_k, the letters p1,,pkp_1,\ldots ,p_k occur in the window, in the same order as in mim_i, but not necessarily consecutively (they may be interleaved with other letters).} The main contribution is an algorithm solving this problem on-line in time O(nq)O(nq)
    corecore