11,351 research outputs found
The Gapped-Factor Tree
International audienceWe present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration
Anyon exclusions statistics on surfaces with gapped boundaries
An anyon exclusion statistics, which generalizes the Bose-Einstein and
Fermi-Dirac statistics of bosons and fermions, was proposed by Haldane[1]. The
relevant past studies had considered only anyon systems without any physical
boundary but boundaries often appear in real-life materials. When fusion of
anyons is involved, certain `pseudo-species' anyons appear in the exotic
statistical weights of non-Abelian anyon systems; however, the meaning and
significance of pseudo-species remains an open problem. In this paper, we
propose an extended anyon exclusion statistics on surfaces with gapped
boundaries, introducing mutual exclusion statistics between anyons as well as
the boundary components. Motivated by Refs. [2, 3], we present a formula for
the statistical weight of many-anyon states obeying the proposed statistics. We
develop a systematic basis construction for non-Abelian anyons on any Riemann
surfaces with gapped boundaries. From the basis construction, we have a
standard way to read off a canonical set of statistics parameters and hence
write down the extended statistical weight of the anyon system being studied.
The basis construction reveals the meaning of pseudo-species. A pseudo-species
has different `excitation' modes, each corresponding to an anyon species. The
`excitation' modes of pseudo-species corresponds to good quantum numbers of
subsystems of a non-Abelian anyon system. This is important because often
(e.g., in topological quantum computing) we may be concerned about only the
entanglement between such subsystems.Comment: 36 pages, 14 figure
Data Structure Lower Bounds for Document Indexing Problems
We study data structure problems related to document indexing and pattern
matching queries and our main contribution is to show that the pointer machine
model of computation can be extremely useful in proving high and unconditional
lower bounds that cannot be obtained in any other known model of computation
with the current techniques. Often our lower bounds match the known space-query
time trade-off curve and in fact for all the problems considered, there is a
very good and reasonable match between the our lower bounds and the known upper
bounds, at least for some choice of input parameters. The problems that we
consider are set intersection queries (both the reporting variant and the
semi-group counting variant), indexing a set of documents for two-pattern
queries, or forbidden- pattern queries, or queries with wild-cards, and
indexing an input set of gapped-patterns (or two-patterns) to find those
matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016,
25 page
Twisted trees and inconsistency of tree estimation when gaps are treated as missing data -- the impact of model mis-specification in distance corrections
Statistically consistent estimation of phylogenetic trees or gene trees is
possible if pairwise sequence dissimilarities can be converted to a set of
distances that are proportional to the true evolutionary distances. Susko et
al. (2004) reported some strikingly broad results about the forms of
inconsistency in tree estimation that can arise if corrected distances are not
proportional to the true distances. They showed that if the corrected distance
is a concave function of the true distance, then inconsistency due to long
branch attraction will occur. If these functions are convex, then two "long
branch repulsion" trees will be preferred over the true tree -- though these
two incorrect trees are expected to be tied as the preferred true. Here we
extend their results, and demonstrate the existence of a tree shape (which we
refer to as a "twisted Farris-zone" tree) for which a single incorrect tree
topology will be guaranteed to be preferred if the corrected distance function
is convex. We also report that the standard practice of treating gaps in
sequence alignments as missing data is sufficient to produce non-linear
corrected distance functions if the substitution process is not independent of
the insertion/deletion process. Taken together, these results imply
inconsistent tree inference under mild conditions. For example, if some
positions in a sequence are constrained to be free of substitutions and
insertion/deletion events while the remaining sites evolve with independent
substitutions and insertion/deletion events, then the distances obtained by
treating gaps as missing data can support an incorrect tree topology even given
an unlimited amount of data.Comment: 29 pages, 3 figure
- âŠ