9 research outputs found
Lightweight BWT and LCP merging via the gap algorithm
Recently, Holt and McMillan [Bioinformatics 2014, ACM-BCB 2014] have proposed a simple and elegant algorithm to merge the Burrows-Wheeler transforms of a collection of strings. In this paper we show that their algorithm can be improved so that, in addition to the BWTs, it also merges the Longest Common Prefix (LCP) arrays. Because of its small memory footprint this new algorithm can be used for the final merge of BWT and LCP arrays computed by a faster but memory intensive construction algorithm
Space-efficient merging of succinct de Bruijn graphs
We propose a new algorithm for merging succinct representations of de Bruijn
graphs introduced in [Bowe et al. WABI 2012]. Our algorithm is based on the
lightweight BWT merging approach by Holt and McMillan [Bionformatics 2014,
ACM-BCB 2014]. Our algorithm has the same asymptotic cost of the state of the
art tool for the same problem presented by Muggli et al. [bioRxiv 2017,
Bioinformatics 2019], but it uses less than half of its working space. A novel
important feature of our algorithm, not found in any of the existing tools, is
that it can compute the Variable Order succinct representation of the union
graph within the same asymptotic time/space bounds.Comment: Accepted to SPIRE'1
Lightweight Metagenomic Classification via eBWT
The development of Next Generation Sequencing has had a major impact on the study of genetic sequences, and in particular, on the advancement of metagenomics, whose aim is to identify the microorganisms that are present in a sample collected directly from the environment. In this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. For the best of our knowledge, this is the first approach that is assembly- and alignment-free, and is not based on k-mers. We show that our experiments confirm the effectiveness of our approach and the high accuracy even in negative control samples. Indeed we only classify 1 short read on 5,726,358 random shuffle reads. Finally, the results are comparable with those achieved by read-mapping classifiers and by k-mer based classifiers
Parallel Computation For The All-pairs Suffix-prefix Problem
We show how to parallelize the optimal algorithm proposed by Tustumi et al. [19] to solve the all-pairs suffix-prefix matching problem for general alphabets. We compared our parallel algorithm with SOF [17], a practical solution for DNA sequences that exhibits good time and space performance in multithreading environments. The experimental results showed that our parallel algorithm achieves a consistent speedup when compared with the sequential algorithm, and it is competitive with SOF when the minimum overlap length is small.995412213223rd International Symposium on String Processing and Information Retrieval (SPIRE)OCT 18-20, 2016Beppu, JAPA