Search CORE

9 research outputs found

Lightweight BWT and LCP merging via the gap algorithm

Author: AJ Cox
FA Louza
FA Louza
G Manzini
J Holt
J Kärkkäinen
J Sirén
P Ferragina
S Burkhardt
S Mantaci
V Geffert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Recently, Holt and McMillan [Bioinformatics 2014, ACM-BCB 2014] have proposed a simple and elegant algorithm to merge the Burrows-Wheeler transforms of a collection of strings. In this paper we show that their algorithm can be improved so that, in addition to the BWTs, it also merges the Longest Common Prefix (LCP) arrays. Because of its small memory footprint this new algorithm can be used for the final merge of BWT and LCP arrays computed by a faster but memory intensive construction algorithm

Crossref

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Space-efficient merging of succinct de Bruijn graphs

Author: A Bowe
B Alipanahi
D Belazzougui
FA Louza
J Holt
L Egidi
MD Muggli
MD Muggli
PA Pevzner
S Marcus
Z Iqbal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We propose a new algorithm for merging succinct representations of de Bruijn graphs introduced in [Bowe et al. WABI 2012]. Our algorithm is based on the lightweight BWT merging approach by Holt and McMillan [Bionformatics 2014, ACM-BCB 2014]. Our algorithm has the same asymptotic cost of the state of the art tool for the same problem presented by Muggli et al. [bioRxiv 2017, Bioinformatics 2019], but it uses less than half of its working space. A novel important feature of our algorithm, not found in any of the existing tools, is that it can compute the Variable Order succinct representation of the union graph within the same asymptotic time/space bounds.Comment: Accepted to SPIRE'1

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Efficiently Detecting Web Spambots in a Temporally Annotated Sequence

Author: D Wang
FA Louza
J Kärkkäinen
M Nicolae
M Yamamoto
MI Abouelhoda
P Heymann
SJ Harvey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/09/2020
Field of study

Crossref

King's Research Portal

Lightweight Metagenomic Classification via eBWT

Author: A Cox
A Restivo
ABR McIntyre
B Langmead
D Kim
DE Wood
F Louza
F Louza
FA Louza
KH Ng
L Egidi
L Janin
L Yang
M Bauer
M Pedersen
MI Abouelhoda
P Bonizzoni
P Menzel
R Ounit
R Ounit
S Mantaci
S Mantaci
S Mantaci
S Mantaci
S Vinga
W-K Hon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The development of Next Generation Sequencing has had a major impact on the study of genetic sequences, and in particular, on the advancement of metagenomics, whose aim is to identify the microorganisms that are present in a sample collected directly from the environment. In this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. For the best of our knowledge, this is the first approach that is assembly- and alignment-free, and is not based on k-mers. We show that our experiments confirm the effectiveness of our approach and the high accuracy even in negative control samples. Indeed we only classify 1 short read on 5,726,358 random shuffle reads. Finally, the results are comparable with those achieved by read-mapping classifiers and by k-mer based classifiers

Crossref

Archivio della Ricerca - Università di Pisa

We show how to parallelize the optimal algorithm proposed by Tustumi et al. [19] to solve the all-pairs suffix-prefix matching problem for general alphabets. We compared our parallel algorithm with SOF [17], a practical solution for DNA sequences that exhibits good time and space performance in multithreading environments. The experimental results showed that our parallel algorithm achieves a consistent speedup when compared with the sequential algorithm, and it is competitive with SOF when the minimum overlap length is small.995412213223rd International Symposium on String Processing and Information Retrieval (SPIRE)OCT 18-20, 2016Beppu, JAPA

Crossref

Repositorio da Producao Cientifica e Intelectual da Unicamp

Highlights of the special scientific sessions of the 45th Annual Scientific Meeting of the International Skeletal Society (ISS) 2018, Berlin, Germany

Author: A Chhabra
B Wang
BA Shannon
C Rehwald
D Albano
D Mckean
F Grande Del
F Lapègue
FA Huber
ICF Louza
J Lim
JB Guimaraes
K Tornow
M Khanna
Miriam A. Bredella
N Pattamapaspong
R Kijowski
RD Boutin
S Lala
S Mutasa
US Zishan
V Bousson
WR Walter
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Parallel External Memory Suffix Sorting

Author: A Andersson
A Apostolico
A Apostolico
A Crauser
B Langmead
F Kulla
FA Louza
G Nong
GH Gonnet
H Li
HE Williams
J Kärkkäinen
J Kärkkäinen
J Kärkkäinen
J Ziv
JT Simpson
M Farach-Colton
MI Abouelhoda
P Ferragina
P Ferragina
R Dementiev
R Grossi
R Nakamura
SJ Puglisi
U Manber
V Osipov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref