Search CORE

66 research outputs found

Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs

Author: Belazzougui Djamal
Cunial Fabio
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)
Publication date: 01/01/2019
Field of study

Given a string T on an alphabet of size sigma, we describe a bidirectional Burrows-Wheeler index that takes O(|T| log sigma) bits of space, and that supports the addition and removal of one character, on the left or right side of any substring of T, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of T, but they could support removal only from specific substrings of T. We also describe an index that supports bidirectional addition and removal in O(log log |T|) time, and that takes a number of words proportional to the number of left and right extensions of the maximal repeats of T. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Wheeler graphs: A framework for BWT-based data structures

Author: Gagie Travis
Manzini Giovanni
Sir\ue9n Jouni
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

The famous Burrows\u2013Wheeler Transform (BWT) was originally defined for a single string but variations have been developed for sets of strings, labeled trees, de Bruijn graphs, etc. In this paper we propose a framework that includes many of these variations and that we hope will simplify the search for more. We first define Wheeler graphs and show they have a property we call path coherence. We show that if the state diagram of a finite-state automaton is a Wheeler graph then, by its path coherence, we can order the nodes such that, for any string, the nodes reachable from the initial state or states by processing that string are consecutive. This means that even if the automaton is non-deterministic, we can still store it compactly and process strings with it quickly. We then rederive several variations of the BWT by designing straightforward finite-state automata for the relevant problems and showing that their state diagrams are Wheeler graphs

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Tilatehokas metagenomisten DNA-fragmenttien ryhmittely

Author: Alanko Jarno
Publication venue
Publication date: 18/01/2016
Field of study

The collection of all genomes in an environment is called the metagenome of the environment. In the past 15 years, high-throughput sequencing has made it feasible to sequence entire environments at once for the first time in history, which has resulted in a variety of interesting new algorithmic problems. This thesis focuses on the basic problem of clustering the reads from an environment according to which species, or more generally, taxonomic unit they originate from. In this work, we identify and formalize two fundamental string processing tasks useful in clustering metagenomic read sets. We solve the two problems with space efficiency in mind using the recently developed bidirectional Burrows-Wheeler index. The algorithms were implemented in a way which makes parallel processing possible. Our tool is experimentally shown to give good results for simple simulated datasets, and to use less than 10 times less space and time compared to two recently published metagenome clustering tools.Kaikkien ympäristössä esiintyvien genomien joukkoa kutsutaan kyseisen ympäristön \emph{metagenomiksi}. Viimeisen 15 vuoden aikana kehitetyt korkean läpisyötön sekvenssoriteknologiat ovat mahdollistaneet ensimmäistä kertaa historiassa kokonaisen ympäristön metagenomin kartoittamisen. Tämä kehityssuunta on johtanut uusiin mielenkiintoisiin algoritmisiin ongelmiin. Tämä työ käsittelee ympäristöistä näytteistettyjen DNA-fragmenttejen ryhmittelyä lajien, tai yleisemmin taksonomisten yksiköiden mukaan. Työssä tunnistetaan ja formalisoidaan kaksi merkkijono-ongelmaa, jotka ilmentyvät metagenomisten DNA-fragmentteja ryhmittelyssä. Ongelmiin esitetään tilatehokkaat ratkaisut käyttäen hiljattain kehitettyä kaksisuuntaista Burrows-Wheeler indeksiä. Algoritmit toteutettiin pitäen silmällä rinnakkaista laskentaa. Työssä osoitetaan, että uusi toteutus antaa hyviä tuloksia yksinkertaisille simuloiduille näytteille, ja että työkalu on kymmenen kertaa nopeampi ja tilatehokkaampi, kuin kaksi hiljattain julkaistua metagenomisten näytteiden ryhmittelyyn tarkoitettua työkalua

Aaltodoc Publication Archive

Prospects and limitations of full-text index structures in genome analysis

Author: Dawyndt Peter
De Baets Bernard
Fack Veerle
Vyverman Michaël
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

Ghent University Academic Bibliography

PubMed Central

Bidirectional Variable-Order de Bruijn Graphs

Author: Belazzougui Djamal
Gagie Travis
Mäkinen Veli
Previtali Marco
Puglisi Simon J.
Publication venue
Publication date: 01/12/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto