Search CORE

88 research outputs found

A simple algorithm for computing the Lempel-Ziv factorization

Author: Crochemore M.
Ilie L.
Smyth William
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

We give a space-efficient simple algorithm for computing the Lempel?Ziv factorization ofa string. For a string of length n over an integer alphabet, it runs in O(n) time independentlyof alphabet size and uses o(n) additional space

Crossref

Research Repository

King's Research Portal

espace@Curtin

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

XBWT Tricks

Author: Manzini Giovanni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The eXtended Burrows-Wheeler Transform (XBWT) is a data transformation introduced in [Ferragina et al., FOCS 2005] to com- pactly represent a labeled tree and simultaneously support navigation and path-search operations over its label structure. A natural application of the XBWT is to store a dictionary of strings. A recent extensive experimental study [Martı́nez-Prieto et al., Informa- tion Systems, 2016] shows that, among the available string dictionary implementations, the XBWT is attractive because of its good tradeoff between small space usage, speed, and support for substring searches. In this paper we further investigate the use of the XBWT for storing a string dictionary. Our first contribution is to show how to add suffix links (aka failure links) to a XBWT string dictionary. For a XBWT dictionary with n internal nodes our suffix links can be traversed in constant time and only take 2n + o(n) bits of space. Our second contribution are practical construction algorithms for the XBWT, including the additional data structure supporting the traver- sal of suffix links. Our algorithms build on the many well engineered algorithms for Suffix Array and BWT construction and offer different tradeoffs between running time and working space

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Computing regularities in strings

Author: Smyth William
Yusufu M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Regularities in strings model many phenomena and thus form the subject of extensive mathematical studies . Perhaps the most conspicuous regularities in strings are those that manifest themselves in the form of repeated subpatterns. In this paper, we study several forms of regularities of strings, that is, repeats, multirepeats, repetitions and runs. We present their similarities and differences by discussing their forms and properties and we explore the existing computation algorithms. We also discuss several data structures useful for computing regularities

Crossref

Research Repository

espace@Curtin

External memory BWT and LCP computation for sequence collections with applications

Author: Egidi Lavinia
Louza Felipe A.
Manzini Giovanni
Telles Guilherme P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Publication date: 01/01/2018
Field of study

We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We show that our algorithm performs O(n maxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm outperforms the current best algorithm for collections of sequences with different lengths and when the average LCP of the collection is relatively small compared to the length of the sequences. In the second part of the paper, we show that our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP arrays, provide simple, scan based, external memory algorithms for three well known problems in bioinformatics: the computation of the all pairs suffix-prefix overlaps, the computation of maximal repeats, and the construction of succinct de Bruijn graphs

arXiv.org e-Print Archive

Directory of Open Access Journals

Archivio della Ricerca - Università di Pisa

Dagstuhl Research Online Publication Server

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Faster External Memory LCP Array Construction

Author: Kempa Dominik
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th Annual European Symposium on Algorithms (ESA 2016)
Publication date: 01/01/2016
Field of study

The suffix array, perhaps the most important data structure in modern string processing, needs to be augmented with the longest-common-prefix (LCP) array in many applications. Their construction is often a major bottleneck especially when the data is too big for internal memory. We describe two new algorithms for computing the LCP array from the suffix array in external memory. Experiments demonstrate that the new algorithms are about a factor of two faster than the fastest previous algorithm

Dagstuhl Research Online Publication Server

Lightweight BWT and LCP merging via the gap algorithm

Author: AJ Cox
FA Louza
FA Louza
G Manzini
J Holt
J Kärkkäinen
J Sirén
P Ferragina
S Burkhardt
S Mantaci
V Geffert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Recently, Holt and McMillan [Bioinformatics 2014, ACM-BCB 2014] have proposed a simple and elegant algorithm to merge the Burrows-Wheeler transforms of a collection of strings. In this paper we show that their algorithm can be improved so that, in addition to the BWTs, it also merges the Longest Common Prefix (LCP) arrays. Because of its small memory footprint this new algorithm can be used for the final merge of BWT and LCP arrays computed by a faster but memory intensive construction algorithm

Crossref

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

mkESA: enhanced suffix array construction tool

Author: Fleer David
Giegerich Robert
Homann Robert
Rehmsmeier Marc
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Summary: We introduce the tool mkESA, an open source program for constructing enhanced suffix arrays (ESAs), striving for low memory consumption, yet high practical speed. mkESA is a user-friendly program written in portable C99, based on a parallelized version of the Deep-Shallow suffix array construction algorithm, which is known for its high speed and small memory usage. The tool handles large FASTA files with multiple sequences, and computes suffix arrays and various additional tables, such as the LCP table (longest common prefix) or the inverse suffix array, from given sequence data

PubMed Central

Publications at Bielefeld University