Search CORE

8,038 research outputs found

Clustering words

Author: Ferenczi Sébastien
Zamboni Luca Q.
Publication venue
Publication date: 06/04/2012
Field of study

We characterize words which cluster under the Burrows-Wheeler transform as those words

w

such that

ww

occurs in a trajectory of an interval exchange transformation, and build examples of clustering words

arXiv.org e-Print Archive

HAL-UJM

Hal-Diderot

Universal lossless source coding with the Burrows Wheeler transform

Author: Effros Michelle
Kulkarni Sanjeev R.
Verdú Sergio
Visweswariah Karthik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source

CiteSeerX

Caltech Authors

Prospects and limitations of full-text index structures in genome analysis

Author: Dawyndt Peter
De Baets Bernard
Fack Veerle
Vyverman Michaël
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

Ghent University Academic Bibliography

PubMed Central

Lyndon Array Construction during Burrows-Wheeler Inversion

Author: Louza Felipe A.
Manzini Giovanni
Smyth W. F.
Telles Guilherme P.
Publication venue: 'Elsevier BV'
Publication date: 27/10/2017
Field of study

In this paper we present an algorithm to compute the Lyndon array of a string

T

of length

n

as a byproduct of the inversion of the Burrows-Wheeler transform of

T

. Our algorithm runs in linear time using only a stack in addition to the data structures used for Burrows-Wheeler inversion. We compare our algorithm with two other linear-time algorithms for Lyndon array construction and show that computing the Burrows-Wheeler transform and then constructing the Lyndon array is competitive compared to the known approaches. We also propose a new balanced parenthesis representation for the Lyndon array that uses

2n+o(n)

bits of space and supports constant time access. This representation can be built in linear time using

O(n)

words of space, or in

O(n\log n/\log\log n)

time using asymptotically the same space as

T

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Research Repository

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

On the combinatorics of suffix arrays

Author: Kucherov Gregory
Tóthmérész Lilla
Vialette Stéphane
Publication venue
Publication date: 18/06/2012
Field of study

We prove several combinatorial properties of suffix arrays, including a characterization of suffix arrays through a bijection with a certain well-defined class of permutations. Our approach is based on the characterization of Burrows-Wheeler arrays given in [1], that we apply by reducing suffix sorting to cyclic shift sorting through the use of an additional sentinel symbol. We show that the characterization of suffix arrays for a special case of binary alphabet given in [2] easily follows from our characterization. Based on our results, we also provide simple proofs for the enumeration results for suffix arrays, obtained in [3]. Our approach to characterizing suffix arrays is the first that exploits their relationship with Burrows-Wheeler permutations

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

On Bijective Variants of the Burrows-Wheeler Transform

Author: Kufleitner Manfred
Publication venue
Publication date: 01/01/2009
Field of study

The sort transform (ST) is a modification of the Burrows-Wheeler transform (BWT). Both transformations map an arbitrary word of length n to a pair consisting of a word of length n and an index between 1 and n. The BWT sorts all rotation conjugates of the input word, whereas the ST of order k only uses the first k letters for sorting all such conjugates. If two conjugates start with the same prefix of length k, then the indices of the rotations are used for tie-breaking. Both transforms output the sequence of the last letters of the sorted list and the index of the input within the sorted list. In this paper, we discuss a bijective variant of the BWT (due to Scott), proving its correctness and relations to other results due to Gessel and Reutenauer (1993) and Crochemore, Desarmenien, and Perrin (2005). Further, we present a novel bijective variant of the ST.Comment: 15 pages, presented at the Prague Stringology Conference 2009 (PSC 2009

arXiv.org e-Print Archive

CiteSeerX

String attractors and combinatorics on words

Author: Mantaci S.
Restivo A.
Romana G.
Rosone G.
Sciortino M.
Publication venue: CEUR-WS
Publication date: 01/01/2019
Field of study

The notion of string attractor has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word w = w[1]w[2] · · · w[n] is a subset Γ of the positions 1, . . ., n, such that all distinct factors of w have an occurrence crossing at least one of the elements of Γ. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the notion of string attractor from a combinatorial point of view, by focusing on several families of finite words. The results presented in the paper suggest that the notion of string attractor can be used to define new tools to investigate combinatorial properties of the words

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Palermo

Aspherical supernova explosions and formation of compact black hole low-mass X-ray binaries

Author: Casares J.
Eggleton P. P.
Gonzalez Hernandez J. I.
Justham S.
Leonard D. C.
Podsiadlowski Ph.
Sutantyo W.
Xiang-Dong Li
Publication venue: 'Wiley'
Publication date: 30/10/2007
Field of study

It has been suggested that black-hole low-mass X-ray binaries (BHLMXBs) with short orbital periods may have evolved from BH binaries with an intermediate-mass secondary, but the donor star seems to always have higher effective temperatures than measured in BHLMXBs (Justham, Rappaport & Podsiadlowski 2006). Here we suggest that the secondary star is originally an intermediate-mass (\sim 2-5 M_{\sun}) star, which loses a large fraction of its mass due to the ejecta impact during the aspherical SN explosion that produced the BH. The resulted secondary star could be of low-mass (\la 1 M_{\sun}). Magnetic braking would shrink the binary orbit, drive mass transfer between the donor and the BH, producing a compact BHLMXB.Comment: 4 pages, accepted for publication in MNRAS Letter

arXiv.org e-Print Archive

Crossref