Search CORE

6 research outputs found

LRBinner: Binning Long Reads in Metagenomics Datasets

Author: Lin Yu
Wickramarachchi Anuradha
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)
Publication date: 01/01/2021
Field of study

Advancements in metagenomics sequencing allow the study of microbial communities directly from their environments. Metagenomics binning is a key step in the species characterisation of microbial communities. Next-generation sequencing reads are usually assembled into contigs for metagenomics binning mainly due to the limited information within short reads. Third-generation sequencing provides much longer reads that have lengths similar to the contigs assembled from short reads. However, existing contig-binning tools cannot be directly applied on long reads due to the absence of coverage information and the presence of high error rates. The few existing long-read binning tools either use only composition or use composition and coverage information separately. This may ignore bins that correspond to low-abundance species or erroneously split bins that correspond to species with non-uniform coverages. Here we present a reference-free binning approach, LRBinner, that combines composition and coverage information of complete long-read datasets. LRBinner also uses a distance-histogram-based clustering algorithm to extract clusters with varying sizes. The experimental results on both simulated and real datasets show that LRBinner achieves the best binning accuracy against the baselines. Moreover, we show that binning reads using LRBinner prior to assembly reduces computational resources for assembly while attaining satisfactory assembly qualities

Dagstuhl Research Online Publication Server

Models and Algorithms for Metagenomics Analysis and Plasmid Classification

Author: Wickramarachchi Anuradha
Publication venue
Publication date: 01/01/2022
Field of study

Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyze metagenomics data, binning is considered a crucial step to characterize the different species of microorganisms present. Metagenomics binning can be extended further towards determination of plasmids and chromosomes to study environmental adaptations. The field of metagenomics binning is mostly done on contigs from genome assemblies. Metagenomics studies are mostly performed with short read sequencing. Direct binning of short reads suffers from insufficient species-specific signal, thus they are usually assembled into longer contigs before binning. Therefore, the emergence of long-read sequencing technologies gives us the opportunity to study the binning of long reads directly, where such studies have been carried out in limited numbers. Firstly, this thesis presents the challenges in binning long reads compared to contigs assembled from short reads. One key challenge in binning long reads is the absence of coverage information, which is typically obtained from assembly. Moreover, the scale of long reads compared to contigs demands more computationally efficient methods for binning. Therefore, we develop MetaBCC-LR to address these challenges and perform metagenomics binning of long reads. We introduce the concept of k-mer coverage histogram to estimate the coverage of long reads without alignments and use a sampling strategy to handle the immense number of long reads. Since MetaBCC-LR is limited by the use of coverage and composition information in a stepwise manner, we further develop LRBinner to combine the coverage and composition information. This enables LRBinner to effectively combine coverage and composition features and use them simultaneously for binning. LRBinner also implemented a novel clustering algorithm that performs better on binning long-read datasets from species with varying abundances. Moreover, we propose OBLR to improve the coverage estimation of long reads via a read-overlap graph instead of k-mers. The read-overlap graph also enables OBLR to perform probabilistic sampling to better recover low-abundant species. Secondly, we investigate opportunities to improve plasmid detection which is considered as a binary plasmid-chromosome classification problem. We introduce PlasLR that enables adaptation of plasmid prediction tools designed for contigs to classify long and error-prone reads. We also develop GraphPlas that uses the assembly graph to improve plasmid classification results for assembled contigs. In summary, this thesis presents the progressive development of models and algorithms for metagenomics binning and plasmid classification

The Australian National University

GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs

Author: Lin Yu
Mallawaarachchi Vijini G.
Wickramarachchi Anuradha S.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)
Publication date: 01/01/2020
Field of study

Metagenomic sequencing allows us to study structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for contig binning only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species). In this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins

Dagstuhl Research Online Publication Server

Binning long reads in metagenomics datasets using composition and coverage information

Author: Anuradha Wickramarachchi
Yu Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2022
Field of study

Abstract Background Advancements in metagenomics sequencing allow the study of microbial communities directly from their environments. Metagenomics binning is a key step in the species characterisation of microbial communities. Next-generation sequencing reads are usually assembled into contigs for metagenomics binning mainly due to the limited information within short reads. Third-generation sequencing provides much longer reads that have lengths similar to the contigs assembled from short reads. However, existing contig-binning tools cannot be directly applied on long reads due to the absence of coverage information and the presence of high error rates. The few existing long-read binning tools either use only composition or use composition and coverage information separately. This may ignore bins that correspond to low-abundance species or erroneously split bins that correspond to species with non-uniform coverages. Here we present a reference-free binning approach, LRBinner, that combines composition and coverage information of complete long-read datasets. LRBinner also uses a distance-histogram-based clustering algorithm to extract clusters with varying sizes. Results The experimental results on both simulated and real datasets show that LRBinner achieves the best binning accuracy in most cases while handling the complete datasets without any sampling. Moreover, we show that binning reads using LRBinner prior to assembly reduces computational resources required for assembly while attaining satisfactory assembly qualities. Conclusion LRBinner shows that deep-learning techniques can be used for effective feature aggregation to support the metagenomics binning of long reads. Furthermore, accurate binning of long reads supports improvements in metagenomics assembly, especially in complex datasets. Binning also helps to reduce the resources required for assembly. Source code for LRBinner is freely available at https://github.com/anuradhawick/LRBinner

Directory of Open Access Journals

Phylogenetic Tree Construction Using K-Mer Forest- Based Distance Calculation

Author: Anuradha Wickramarachchi
Gihan Gamage
Indika Perera
Nadeeshan Gimhana
Shanaka Bandara
Thilina Pathirana
Vijini Mallawaarachchi
Publication venue: 'International Association of Online Engineering (IAOE)'
Publication date: 01/06/2020
Field of study

Phylogenetics is one of the dominant data engineering research disciplines based on biological information. More particularly here, we consider raw DNA sequences and do comparative analysis in order to come up with important conclusions. When representing evolutionary relationships among different organisms in a concise manner, the phylogenetic tree helps significantly. When constructing phylogenetic trees, the elementary step is to calculate the genetic distance among species. Alignment-based sequencing and alignment-free sequencing are the two main distance computation methods that are used to find genetic relatedness of different species. In this paper we propose a novel alignment-free, pairwise, distance calculation method based on k-mers and a state of art machine learning-based phylogenetic tree construction mechanism. With the proposed approach we can convert longer DNA sequences into compendious k-mer forests which gear up the efficiency of comparison. Later we construct the phylogenetic tree based on calculated distances with the help of an algorithm build upon k-medoid clustering, which guaranteed significant efficiency and accuracy compared to traditional phylogenetic tree construction methods

Directory of Open Access Journals

Online-Journals.org (International Association of Online Engineering)

Phylogenetic Tree Construction Using K-Mer Forest- Based Distance Calculation

Author: Bandara Shanaka
Gamage Gihan
Gimhana Nadeeshan
Mallawaarachchi Vijini
Pathirana Thilina
Perera Indika
Wickramarachchi Anuradha
Publication venue: 'International Association of Online Engineering (IAOE)'
Publication date: 19/06/2020
Field of study

Online-Journals.org (International Association of Online Engineering)