A framework for space-efficient read clustering in metagenomic samples

Alanko, Jarno; Cunial, Fabio; Belazzougui, Djamal; Mäkinen, Veli

research

oai:helda.helsinki.fi:10138/182782

A framework for space-efficient read clustering in metagenomic samples

Authors: Jarno Alanko
Fabio Cunial
Djamal Belazzougui
Veli Mäkinen
Publication date: 14 March 2017
Publisher: BMC
Doi

Abstract

Background: A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed. Results: We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length l each, on an alphabet of total size sigma, our algorithms take O(n(t + log sigma)) time and just 2n + o(n) + O(max{l sigma log n, K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure. Conclusions: Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.Peer reviewe

Similar works

Full text

Open in the Core reader

Download PDF

Helsingin yliopiston digitaalinen arkisto

oai:helda.helsinki.fi:10138/18...

Last time updated on 03/08/2017

This paper was published in Helsingin yliopiston digitaalinen arkisto.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.