11,351 research outputs found

    The Gapped-Factor Tree

    Get PDF
    International audienceWe present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration

    Anyon exclusions statistics on surfaces with gapped boundaries

    Full text link
    An anyon exclusion statistics, which generalizes the Bose-Einstein and Fermi-Dirac statistics of bosons and fermions, was proposed by Haldane[1]. The relevant past studies had considered only anyon systems without any physical boundary but boundaries often appear in real-life materials. When fusion of anyons is involved, certain `pseudo-species' anyons appear in the exotic statistical weights of non-Abelian anyon systems; however, the meaning and significance of pseudo-species remains an open problem. In this paper, we propose an extended anyon exclusion statistics on surfaces with gapped boundaries, introducing mutual exclusion statistics between anyons as well as the boundary components. Motivated by Refs. [2, 3], we present a formula for the statistical weight of many-anyon states obeying the proposed statistics. We develop a systematic basis construction for non-Abelian anyons on any Riemann surfaces with gapped boundaries. From the basis construction, we have a standard way to read off a canonical set of statistics parameters and hence write down the extended statistical weight of the anyon system being studied. The basis construction reveals the meaning of pseudo-species. A pseudo-species has different `excitation' modes, each corresponding to an anyon species. The `excitation' modes of pseudo-species corresponds to good quantum numbers of subsystems of a non-Abelian anyon system. This is important because often (e.g., in topological quantum computing) we may be concerned about only the entanglement between such subsystems.Comment: 36 pages, 14 figure

    Data Structure Lower Bounds for Document Indexing Problems

    Get PDF
    We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful in proving high and unconditional lower bounds that cannot be obtained in any other known model of computation with the current techniques. Often our lower bounds match the known space-query time trade-off curve and in fact for all the problems considered, there is a very good and reasonable match between the our lower bounds and the known upper bounds, at least for some choice of input parameters. The problems that we consider are set intersection queries (both the reporting variant and the semi-group counting variant), indexing a set of documents for two-pattern queries, or forbidden- pattern queries, or queries with wild-cards, and indexing an input set of gapped-patterns (or two-patterns) to find those matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016, 25 page

    Twisted trees and inconsistency of tree estimation when gaps are treated as missing data -- the impact of model mis-specification in distance corrections

    Get PDF
    Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the true distances. They showed that if the corrected distance is a concave function of the true distance, then inconsistency due to long branch attraction will occur. If these functions are convex, then two "long branch repulsion" trees will be preferred over the true tree -- though these two incorrect trees are expected to be tied as the preferred true. Here we extend their results, and demonstrate the existence of a tree shape (which we refer to as a "twisted Farris-zone" tree) for which a single incorrect tree topology will be guaranteed to be preferred if the corrected distance function is convex. We also report that the standard practice of treating gaps in sequence alignments as missing data is sufficient to produce non-linear corrected distance functions if the substitution process is not independent of the insertion/deletion process. Taken together, these results imply inconsistent tree inference under mild conditions. For example, if some positions in a sequence are constrained to be free of substitutions and insertion/deletion events while the remaining sites evolve with independent substitutions and insertion/deletion events, then the distances obtained by treating gaps as missing data can support an incorrect tree topology even given an unlimited amount of data.Comment: 29 pages, 3 figure
    • 

    corecore