247 research outputs found

    On Bijective Variants of the Burrows-Wheeler Transform

    Full text link
    The sort transform (ST) is a modification of the Burrows-Wheeler transform (BWT). Both transformations map an arbitrary word of length n to a pair consisting of a word of length n and an index between 1 and n. The BWT sorts all rotation conjugates of the input word, whereas the ST of order k only uses the first k letters for sorting all such conjugates. If two conjugates start with the same prefix of length k, then the indices of the rotations are used for tie-breaking. Both transforms output the sequence of the last letters of the sorted list and the index of the input within the sorted list. In this paper, we discuss a bijective variant of the BWT (due to Scott), proving its correctness and relations to other results due to Gessel and Reutenauer (1993) and Crochemore, Desarmenien, and Perrin (2005). Further, we present a novel bijective variant of the ST.Comment: 15 pages, presented at the Prague Stringology Conference 2009 (PSC 2009

    Permutation patterns in genome rearrangement problems

    Get PDF
    In the context of the genome rearrangement problem, we analyze two well known models, namely the block transposition and the prefix block transposition models, by exploiting the connection with the notion of permutation pattern. More specifically, for any kk, we provide a characterization of the set of permutations having distance k\leq k from the identity (which is known to be a permutation class) in terms of what we call generating permutations and we describe some properties of its basis, which allow to compute such a basis for small values of kk.Comment: 8 pages. In: L. Ferrari, M. Vamvakari (eds.): Proceedings of the GASCom 2018 Workshop, Athens, Greece, 18--20 June 2018, published at http://ceur-ws.or

    Algebraic aspects of increasing subsequences

    Get PDF
    We present a number of results relating partial Cauchy-Littlewood sums, integrals over the compact classical groups, and increasing subsequences of permutations. These include: integral formulae for the distribution of the longest increasing subsequence of a random involution with constrained number of fixed points; new formulae for partial Cauchy-Littlewood sums, as well as new proofs of old formulae; relations of these expressions to orthogonal polynomials on the unit circle; and explicit bases for invariant spaces of the classical groups, together with appropriate generalizations of the straightening algorithm.Comment: LaTeX+amsmath+eepic; 52 pages. Expanded introduction, new references, other minor change

    Bruhat Order in the Full Symmetric sln\mathfrak{sl}_n Toda Lattice on Partial Flag Space

    Full text link
    In our previous paper [Comm. Math. Phys. 330 (2014), 367-399] we described the asymptotic behaviour of trajectories of the full symmetric sln\mathfrak{sl}_n Toda lattice in the case of distinct eigenvalues of the Lax matrix. It turned out that it is completely determined by the Bruhat order on the permutation group. In the present paper we extend this result to the case when some eigenvalues of the Lax matrix coincide. In that case the trajectories are described in terms of the projection to a partial flag space where the induced dynamical system verifies the same properties as before: we show that when t±t\to\pm\infty the trajectories of the induced dynamical system converge to a finite set of points in the partial flag space indexed by the Schubert cells so that any two points of this set are connected by a trajectory if and only if the corresponding cells are adjacent. This relation can be explained in terms of the Bruhat order on multiset permutations

    Set-to-Sequence Methods in Machine Learning: A Review

    Get PDF
    Machine learning on sets towards sequential output is an important and ubiquitous task, with applications ranging from language modelling and meta-learning to multi-agent strategy games and power grid optimization. Combining elements of representation learning and structured prediction, its two primary challenges include obtaining a meaningful, permutation invariant set representation and subsequently utilizing this representation to output a complex target permutation. This paper provides a comprehensive introduction to the field as well as an overview of important machine learning methods tackling both of these key challenges, with a detailed qualitative comparison of selected model architectures.Comment: 46 pages of text, with 10 pages of references. Contains 2 tables and 4 figure

    A Lower Bound Technique for Communication in BSP

    Get PDF
    Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

    Discovery of Unconventional Patterns for Sequence Analysis: Theory and Algorithms

    Get PDF
    The biology community is collecting a large amount of raw data, such as the genome sequences of organisms, microarray data, interaction data such as gene-protein interactions, protein-protein interactions, etc. This amount is rapidly increasing and the process of understanding the data is lagging behind the process of acquiring it. An inevitable first step towards making sense of the data is to study their regularities focusing on the non-random structures appearing surprisingly often in the input sequences: patterns. In this thesis we discuss three incarnations of the pattern discovery task, exploring three types of patterns that can model different regularities of the input dataset. While mask patterns have been designed to model short repeated biological sequences, showing a high conservation of their content at some specific positions, permutation patterns have been designed to detect repeated patterns whose parts maintain their physical adjacency but not their ordering in all the pattern occurrences. Transposons, instead, model mobile sequences in the input dataset, which can be discovered by comparing different copies of the same input string, detecting large insertions and deletions in their alignment

    Pattern avoidance in forests of binary shrubs

    Get PDF
    We investigate pattern avoidance in permutations satisfying some additional restrictions. These are naturally considered in terms of avoiding patterns in linear extensions of certain forest-like partially ordered sets, which we call binary shrub forests. In this context, we enumerate forests avoiding patterns of length three. In four of the five non-equivalent cases, we present explicit enumerations by exhibiting bijections with certain lattice paths bounded above by the line y = lx, for some l in Q+, one of these being the celebrated Duchon’s club paths with l = 2/3. In the remaining case, we use the machinery of analytic combinatorics to determine the minimal polynomial of its generating function, and deduce its growth rate

    r-indexing the eBWT

    Get PDF
    The extended Burrows-Wheeler Transform (eBWT) [Mantaci et al. TCS 2007] is a variant of the BWT, introduced for collections of strings. In this paper, we present the extended r-index, an analogous data structure to the r-index [Gagie et al. JACM 2020]. It occupies O(r) words, with r the number of runs of the eBWT, and offers the same functionalities as the r-index. We also show how to efficiently support finding maximal exact matches (MEMs). We implemented the extended r-index and tested it on circular bacterial genomes and plasmids, comparing it to five state-of-the-art compressed text indexes. While our data structure maintains similar time and memory requirements for answering pattern matching queries as the original r-index, it is the only index in the literature that can naturally be used for both circular and linear input collections. This is an extended version of [Boucher et al., r-indexing the eBWT, SPIRE 2021]
    corecore