8,397 research outputs found

    Detecting One-variable Patterns

    Full text link
    Given a pattern p=s1x1s2x2sr1xr1srp = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r such that x1,x2,,xr1{x,x}x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}, where xx is a variable and x\overset{{}_{\leftarrow}}{x} its reversal, and s1,s2,,srs_1,s_2,\ldots,s_r are strings that contain no variables, we describe an algorithm that constructs in O(rn)O(rn) time a compact representation of all PP instances of pp in an input string of length nn over a polynomially bounded integer alphabet, so that one can report those instances in O(P)O(P) time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201

    Composite repetition-aware data structures

    Get PDF
    In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

    On the Parikh-de-Bruijn grid

    Full text link
    We introduce the Parikh-de-Bruijn grid, a graph whose vertices are fixed-order Parikh vectors, and whose edges are given by a simple shift operation. This graph gives structural insight into the nature of sets of Parikh vectors as well as that of the Parikh set of a given string. We show its utility by proving some results on Parikh-de-Bruijn strings, the abelian analog of de-Bruijn sequences.Comment: 18 pages, 3 figures, 1 tabl

    Optimal Staged Self-Assembly of General Shapes

    Get PDF
    We analyze the number of tile types tt, bins bb, and stages necessary to assemble n×nn \times n squares and scaled shapes in the staged tile assembly model. For n×nn \times n squares, we prove O(logntbtlogtb2+loglogblogt)\mathcal{O}(\frac{\log{n} - tb - t\log t}{b^2} + \frac{\log \log b}{\log t}) stages suffice and Ω(logntbtlogtb2)\Omega(\frac{\log{n} - tb - t\log t}{b^2}) are necessary for almost all nn. For shapes SS with Kolmogorov complexity K(S)K(S), we prove O(K(S)tbtlogtb2+loglogblogt)\mathcal{O}(\frac{K(S) - tb - t\log t}{b^2} + \frac{\log \log b}{\log t}) stages suffice and Ω(K(S)tbtlogtb2)\Omega(\frac{K(S) - tb - t\log t}{b^2}) are necessary to assemble a scaled version of SS, for almost all SS. We obtain similarly tight bounds when the more powerful flexible glues are permitted.Comment: Abstract version appeared in ESA 201
    corecore