Search CORE

668 research outputs found

Clustering words

Author: Ferenczi Sébastien
Zamboni Luca Q.
Publication venue
Publication date: 06/04/2012
Field of study

We characterize words which cluster under the Burrows-Wheeler transform as those words

w

such that

ww

occurs in a trajectory of an interval exchange transformation, and build examples of clustering words

arXiv.org e-Print Archive

HAL-UJM

Hal-Diderot

String attractors and combinatorics on words

Author: Mantaci S.
Restivo A.
Romana G.
Rosone G.
Sciortino M.
Publication venue: CEUR-WS
Publication date: 01/01/2019
Field of study

The notion of string attractor has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word w = w[1]w[2] · · · w[n] is a subset Γ of the positions 1, . . ., n, such that all distinct factors of w have an occurrence crossing at least one of the elements of Γ. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the notion of string attractor from a combinatorial point of view, by focusing on several families of finite words. The results presented in the paper suggest that the notion of string attractor can be used to define new tools to investigate combinatorial properties of the words

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Palermo

Sorting suffixes of a text via its Lyndon Factorization

Author: Mantaci Sabrina
Restivo Antonio
Rosone Giovanna
Sciortino Marinella
Publication venue
Publication date: 01/01/2013
Field of study

The process of sorting the suffixes of a text plays a fundamental role in Text Algorithms. They are used for instance in the constructions of the Burrows-Wheeler transform and the suffix array, widely used in several fields of Computer Science. For this reason, several recent researches have been devoted to finding new strategies to obtain effective methods for such a sorting. In this paper we introduce a new methodology in which an important role is played by the Lyndon factorization, so that the local suffixes inside factors detected by this factorization keep their mutual order when extended to the suffixes of the whole word. This property suggests a versatile technique that easily can be adapted to different implementative scenarios.Comment: Submitted to the Prague Stringology Conference 2013 (PSC 2013

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Palermo

Burrows-wheeler transform of words defined by morphisms

Author: A Restivo
A Restivo
A Restivo
B Tan
D Adjeroh
E Barcucci
G Manzini
GA Hedlund
H Kaplan
I Gessel
J Simpson
M Crochemore
M Lothaire
P Ferragina
S Mantaci
S Mantaci
S Mantaci
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Crossref

Florence Research

When a Dollar Makes a BWT

Author: Giuliani Sara
Liptak Zsuzsanna
Rizzi Romeo
Publication venue
Publication date: 01/01/2019
Field of study

TheBurrows-Wheeler-Transform(BWT)isareversiblestring transformation which plays a central role in text compression and is fun- damental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every string is a BWT image, and exact charac- terizations of BWT images are known. We investigate a related combi- natorial question. In many applications, a sentinel character

is added to mark the end of the string, and thus the BWT of a string ending with

contains exactly one

character. We ask, given a string w, in which positions, if any, can the

-character be inserted to turn w into the BWT image of a word ending with the sentinel character. We show that this depends only on the standard permutation of w and give a combinatorial characterization of such positions via this permutation. We then develop an O(n log n)-time algorithm for identifying all such positions, improving on the naive quadratic time algorithm

Catalogo dei prodotti della ricerca

The Alternating BWT: An algorithmic perspective

Author: Giancarlo R.
Manzini G.
Restivo A.
Rosone G.
Sciortino M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23], where we have shown that BWT and ABWT are part of a larger class of reversible transformations, here we provide a combinatorial and algorithmic study of the novel transform ABWT. We establish a deep analogy between BWT and ABWT by proving they are the only ones in the above mentioned class to be rank-invertible, a novel notion guaranteeing efficient invertibility. In addition, we show that the backward-search procedure can be efficiently generalized to the ABWT; this result implies that also the ABWT can be used as a basis for efficient compressed full text indices. Finally, we prove that the ABWT can be efficiently computed by using a combination of the Difference Cover suffix sorting algorithm (K\ue4rkk\ue4inen et al., 2006 [28]) with a linear time algorithm for finding the minimal cyclic rotation of a word with respect to the alternating lexicographical order

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Metagenomic analysis through the extended Burrows-Wheeler transform

Author: Guerrini V.
Louza F. A.
Rosone G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Background: The development of Next Generation Sequencing (NGS) has had a major impact on the study of genetic sequences. Among problems that researchers in the field have to face, one of the most challenging is the taxonomic classification of metagenomic reads, i.e., identifying the microorganisms that are present in a sample collected directly from the environment. The analysis of environmental samples (metagenomes) are particularly important to figure out the microbial composition of different ecosystems and it is used in a wide variety of fields: for instance, metagenomic studies in agriculture can help understanding the interactions between plants and microbes, or in ecology, they can provide valuable insights into the functions of environmental communities. Results: In this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. The tool LiME (Lightweight Metagenomics via eBWT) is available at https://github.com/veronicaguerrini/LiME. Conclusions: In order to assess the reliability of our approach, we run several experiments on NGS data from two simulated metagenomes among those provided in benchmarking analysis and on a real metagenome from the Human Microbiome Project. The experiment results on the simulated data show that LiME is competitive with the widely used taxonomic classifiers. It achieves high levels of precision and specificity - e.g. 99.9% of the positive control reads are correctly assigned and the percentage of classified reads of the negative control is less than 0.01% - while keeping a high sensitivity. On the real metagenome, we show that LiME is able to deliver classification results comparable to that of MagicBlast. Overall, the experiments confirm the effectiveness of our method and its high accuracy even in negative control samples

Archivio della Ricerca - Università di Pisa