247 research outputs found
On Bijective Variants of the Burrows-Wheeler Transform
The sort transform (ST) is a modification of the Burrows-Wheeler transform
(BWT). Both transformations map an arbitrary word of length n to a pair
consisting of a word of length n and an index between 1 and n. The BWT sorts
all rotation conjugates of the input word, whereas the ST of order k only uses
the first k letters for sorting all such conjugates. If two conjugates start
with the same prefix of length k, then the indices of the rotations are used
for tie-breaking. Both transforms output the sequence of the last letters of
the sorted list and the index of the input within the sorted list. In this
paper, we discuss a bijective variant of the BWT (due to Scott), proving its
correctness and relations to other results due to Gessel and Reutenauer (1993)
and Crochemore, Desarmenien, and Perrin (2005). Further, we present a novel
bijective variant of the ST.Comment: 15 pages, presented at the Prague Stringology Conference 2009 (PSC
2009
Permutation patterns in genome rearrangement problems
In the context of the genome rearrangement problem, we analyze two well known
models, namely the block transposition and the prefix block transposition
models, by exploiting the connection with the notion of permutation pattern.
More specifically, for any , we provide a characterization of the set of
permutations having distance from the identity (which is known to be a
permutation class) in terms of what we call generating permutations and we
describe some properties of its basis, which allow to compute such a basis for
small values of .Comment: 8 pages. In: L. Ferrari, M. Vamvakari (eds.): Proceedings of the
GASCom 2018 Workshop, Athens, Greece, 18--20 June 2018, published at
http://ceur-ws.or
Algebraic aspects of increasing subsequences
We present a number of results relating partial Cauchy-Littlewood sums,
integrals over the compact classical groups, and increasing subsequences of
permutations. These include: integral formulae for the distribution of the
longest increasing subsequence of a random involution with constrained number
of fixed points; new formulae for partial Cauchy-Littlewood sums, as well as
new proofs of old formulae; relations of these expressions to orthogonal
polynomials on the unit circle; and explicit bases for invariant spaces of the
classical groups, together with appropriate generalizations of the
straightening algorithm.Comment: LaTeX+amsmath+eepic; 52 pages. Expanded introduction, new references,
other minor change
Bruhat Order in the Full Symmetric Toda Lattice on Partial Flag Space
In our previous paper [Comm. Math. Phys. 330 (2014), 367-399] we described
the asymptotic behaviour of trajectories of the full symmetric
Toda lattice in the case of distinct eigenvalues of the Lax
matrix. It turned out that it is completely determined by the Bruhat order on
the permutation group. In the present paper we extend this result to the case
when some eigenvalues of the Lax matrix coincide. In that case the trajectories
are described in terms of the projection to a partial flag space where the
induced dynamical system verifies the same properties as before: we show that
when the trajectories of the induced dynamical system converge
to a finite set of points in the partial flag space indexed by the Schubert
cells so that any two points of this set are connected by a trajectory if and
only if the corresponding cells are adjacent. This relation can be explained in
terms of the Bruhat order on multiset permutations
Set-to-Sequence Methods in Machine Learning: A Review
Machine learning on sets towards sequential output is an important and
ubiquitous task, with applications ranging from language modelling and
meta-learning to multi-agent strategy games and power grid optimization.
Combining elements of representation learning and structured prediction, its
two primary challenges include obtaining a meaningful, permutation invariant
set representation and subsequently utilizing this representation to output a
complex target permutation. This paper provides a comprehensive introduction to
the field as well as an overview of important machine learning methods tackling
both of these key challenges, with a detailed qualitative comparison of
selected model architectures.Comment: 46 pages of text, with 10 pages of references. Contains 2 tables and
4 figure
A Lower Bound Technique for Communication in BSP
Communication is a major factor determining the performance of algorithms on
current computing systems; it is therefore valuable to provide tight lower
bounds on the communication complexity of computations. This paper presents a
lower bound technique for the communication complexity in the bulk-synchronous
parallel (BSP) model of a given class of DAG computations. The derived bound is
expressed in terms of the switching potential of a DAG, that is, the number of
permutations that the DAG can realize when viewed as a switching network. The
proposed technique yields tight lower bounds for the fast Fourier transform
(FFT), and for any sorting and permutation network. A stronger bound is also
derived for the periodic balanced sorting network, by applying this technique
to suitable subnetworks. Finally, we demonstrate that the switching potential
captures communication requirements even in computational models different from
BSP, such as the I/O model and the LPRAM
Discovery of Unconventional Patterns for Sequence Analysis: Theory and Algorithms
The biology community is collecting a large amount of raw data, such as the genome sequences of organisms, microarray data, interaction
data such as gene-protein interactions, protein-protein interactions, etc. This amount is rapidly increasing and the process of understanding the data is lagging behind the process of acquiring it. An inevitable first step towards making sense of the data is to study their regularities focusing on the non-random structures appearing surprisingly often in the input sequences: patterns.
In this thesis we discuss three incarnations of the pattern discovery task, exploring three types of patterns that can model different regularities of the input dataset.
While mask patterns have been designed to model short repeated biological sequences, showing a high conservation of their content at some specific positions, permutation patterns have been designed to detect repeated patterns whose parts maintain their physical adjacency but
not their ordering in all the pattern occurrences.
Transposons, instead, model mobile sequences in the input dataset, which can be discovered by comparing different copies of the same
input string, detecting large insertions and deletions in their alignment
Pattern avoidance in forests of binary shrubs
We investigate pattern avoidance in permutations satisfying some additional restrictions. These are naturally considered in terms of avoiding patterns in linear extensions of certain forest-like partially ordered sets, which we call binary shrub forests. In this context, we enumerate forests avoiding patterns of length three. In four of the five non-equivalent cases, we present explicit enumerations by exhibiting bijections with certain lattice paths bounded above by the line y = lx, for some l in Q+, one of these being the celebrated Duchon’s club paths with l = 2/3. In the remaining case, we use the machinery of analytic combinatorics to determine the minimal polynomial of its generating function, and deduce its growth rate
r-indexing the eBWT
The extended Burrows-Wheeler Transform (eBWT) [Mantaci et al. TCS 2007] is a variant of the BWT, introduced for collections of strings. In this paper, we present the extended r-index, an analogous data structure to the r-index [Gagie et al. JACM 2020]. It occupies O(r) words, with r the number of runs of the eBWT, and offers the same functionalities as the r-index. We also show how to efficiently support finding maximal exact matches (MEMs). We implemented the extended r-index and tested it on circular bacterial genomes and plasmids, comparing it to five state-of-the-art compressed text indexes. While our data structure maintains similar time and memory requirements for answering pattern matching queries as the original r-index, it is the only index in the literature that can naturally be used for both circular and linear input collections. This is an extended version of [Boucher et al., r-indexing the eBWT, SPIRE 2021]
- …