5,517 research outputs found

    Internal Pattern Matching Queries in a Text and Applications

    Full text link
    We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword xx in another subword yy of a given text, assuming that y=O(x)|y|=\mathcal{O}(|x|), which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding δ\delta-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed δ\delta we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

    Toric geometry of path signature varieties

    Full text link
    In stochastic analysis, a standard method to study a path is to work with its signature. This is a sequence of tensors of different order that encode information of the path in a compact form. When the path varies, such signatures parametrize an algebraic variety in the tensor space. The study of these signature varieties builds a bridge between algebraic geometry and stochastics, and allows a fruitful exchange of techniques, ideas, conjectures and solutions. In this paper we study the signature varieties of two very different classes of paths. The class of rough paths is a natural extension of the class of piecewise smooth paths. It plays a central role in stochastics, and its signature variety is toric. The class of axis-parallel paths has a peculiar combinatoric flavour, and we prove that it is toric in many cases.Comment: Code for the computations is available at https://sites.google.com/view/l-colmenarejo/publications/cod

    Eulerian quasisymmetric functions

    Get PDF
    We introduce a family of quasisymmetric functions called {\em Eulerian quasisymmetric functions}, which specialize to enumerators for the joint distribution of the permutation statistics, major index and excedance number on permutations of fixed cycle type. This family is analogous to a family of quasisymmetric functions that Gessel and Reutenauer used to study the joint distribution of major index and descent number on permutations of fixed cycle type. Our central result is a formula for the generating function for the Eulerian quasisymmetric functions, which specializes to a new and surprising qq-analog of a classical formula of Euler for the exponential generating function of the Eulerian polynomials. This qq-analog computes the joint distribution of excedance number and major index, the only of the four important Euler-Mahonian distributions that had not yet been computed. Our study of the Eulerian quasisymmetric functions also yields results that include the descent statistic and refine results of Gessel and Reutenauer. We also obtain qq-analogs, (q,p)(q,p)-analogs and quasisymmetric function analogs of classical results on the symmetry and unimodality of the Eulerian polynomials. Our Eulerian quasisymmetric functions refine symmetric functions that have occurred in various representation theoretic and enumerative contexts including MacMahon's study of multiset derangements, work of Procesi and Stanley on toric varieties of Coxeter complexes, Stanley's work on chromatic symmetric functions, and the work of the authors on the homology of a certain poset introduced by Bj\"orner and Welker.Comment: Final version; to appear in Advances in Mathematics; 52 pages; this paper was originally part of the longer paper arXiv:0805.2416v1, which has been split into three paper

    Detecting One-variable Patterns

    Full text link
    Given a pattern p=s1x1s2x2sr1xr1srp = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r such that x1,x2,,xr1{x,x}x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}, where xx is a variable and x\overset{{}_{\leftarrow}}{x} its reversal, and s1,s2,,srs_1,s_2,\ldots,s_r are strings that contain no variables, we describe an algorithm that constructs in O(rn)O(rn) time a compact representation of all PP instances of pp in an input string of length nn over a polynomially bounded integer alphabet, so that one can report those instances in O(P)O(P) time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201
    corecore