Search CORE

PubMed Central

Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition

Author: Halgamuge Saman K.
Saeed Isaam
Tang Sen-Lin
Publication venue: Oxford University Press
Publication date: 29/11/2018
Field of study

An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis

PubMed Central

Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness

Author: Chang BC
Halgamuge Saman
Jayasundara Duleepa
Saeed Isaam
Tang Sen-Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2018
Field of study

Background Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. The lack of knowledge on the number of different strains in a quasispecies population is observed to hinder the precision of existing Viral Quasispecies Spectrum Reconstruction (QSR) methods due to the uncontrolled reconstruction of a large number of in silico false positives. In this work, we formulated a novel probabilistic method for strain richness estimation specifically targeting viral quasispecies. By using this approach we improved our recently proposed spectrum reconstruction pipeline ViQuaS to achieve higher levels of precision in reconstructed quasispecies spectra without compromising the recall rates. We also discuss how one other existing popular QSR method named ShoRAH can be improved using this new approach. Results On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. Conclusions The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors

Assessing Species Diversity Using Metavirome Data: Methods and Challenges

Author: Ackland David
Halgamuge Saman
Herath Damayanthi
Jayasundara Duleepa
Saeed Isaam
Tang Sen-Lin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Assessing biodiversity is an important step in the study of microbial ecology associated with a given environment. Multiple indices have been used to quantify species diversity, which is a key biodiversity measure. Measuring species diversity of viruses in different environments remains a challenge relative to measuring the diversity of other microbial communities. Metagenomics has played an important role in elucidating viral diversity by conducting metavirome studies; however, metavirome data are of high complexity requiring robust data preprocessing and analysis methods. In this review, existing bioinformatics methods for measuring species diversity using metavirome data are categorised broadly as either sequence similarity-dependent methods or sequence similarity-independent methods. The former includes a comparison of DNA fragments or assemblies generated in the experiment against reference databases for quantifying species diversity, whereas estimates from the latter are independent of the knowledge of existing sequence data. Current methods and tools are discussed in detail, including their applications and limitations. Drawbacks of the state-of-the-art method are demonstrated through results from a simulation. In addition, alternative approaches are proposed to overcome the challenges in estimating species diversity measures using metavirome data.DH is fully supported by the PhD scholarships of The University of Melbourne. This work is also supported by Australian Research Council grant LP140100670 and the industry partner YourGeneBioScience

UNSWorks

A new peak detection algorithm for MALDI mass spectrometry data based on a modified Asymmetric Pseudo-Voigt model

Author: Boughtn Berin A
Halgamuge Saman
Roessner Ute
Saeed Isaam
Wijetunge Chalini D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2018
Field of study

Background: Mass Spectrometry (MS) is a ubiquitous analytical tool in biological research and is used to measure the mass-to-charge ratio of bio-molecules. Peak detection is the essential first step in MS data analysis. Precise estimation of peak parameters such as peak summit location and peak area are critical to identify underlying bio-molecules and to estimate their abundances accurately. We propose a new method to detect and quantify peaks in mass spectra. It uses dual-tree complex wavelet transformation along with Stein's unbiased risk estimator for spectra smoothing. Then, a new method, based on the modified Asymmetric Pseudo-Voigt (mAPV) model and hierarchical particle swarm optimization, is used for peak parameter estimation. Results: Using simulated data, we demonstrated the benefit of using the mAPV model over Gaussian, Lorentz and Bi-Gaussian functions for MS peak modelling. The proposed mAPV model achieved the best fitting accuracy for asymmetric peaks, with lower percentage errors in peak summit location estimation, which were 0.17% to 4.46% less than that of the other models. It also outperformed the other models in peak area estimation, delivering lower percentage errors, which were about 0.7% less than its closest competitor - the Bi-Gaussian model. In addition, using data generated from a MALDI-TOF computer model, we showed that the proposed overall algorithm outperformed the existing methods mainly in terms of sensitivity. It achieved a sensitivity of 85%, compared to 77% and 71% of the two benchmark algorithms, continuous wavelet transformation based method and Cromwell respectively. Conclusions: The proposed algorithm is particularly useful for peak detection and parameter estimation in MS data with overlapping peak distributions and asymmetric peaks. The algorithm is implemented using MATLAB and the source code is freely available at http://mapv.sourceforge.net

ENVirT: inference of ecological characteristics of viruses from metagenomic data

Author: Chang Bill C.
Halgamuge Saman
Herath Damayanthi
Jayasundara Duleepa
Saeed Isaam
Senanayake Damith
Sun Yuan
Tang Sen-Lin
Yang Cheng-Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.This work was supported partially by Australia Research Council [grant numbers LP140100670 and DP150103512] and the Biodiversity Research Center, Academia Sinica, Taiwan. DJ, DH, DS and YS were funded by the MIFRS and MIRS scholarships of The University of Melbourne. Publication costs were funded by The Australian National University

FigShare

Prokaryotic assemblages and metagenomes in pelagic zones of the South China Sea

Author: Chen Yi-Lung
Chiang Pei-Wen
Halgamuge Saman
Hsu Ting-Chang
Lai Hung-Chun
Saeed Isaam
Shiah Fuh-Kwo
Shieh Wung-Yang
Tang Sen-Lin
Tseng Ching-Hung
Tseng Chun-Mao
Wen Liang-Saw
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2018
Field of study

Background: Prokaryotic microbes, the most abundant organisms in the ocean, are remarkably diverse. Despite numerous studies of marine prokaryotes, the zonation of their communities in pelagic zones has been poorly delineated. By exploiting the persistent stratification of the South China Sea (SCS), we performed a 2-year, large spatial scale (10, 100, 1000, and 3000 m) survey, which included a pilot study in 2006 and comprehensive sampling in 2007, to investigate the biological zonation of bacteria and archaea using 16S rRNA tag and shotgun metagenome sequencing. Results: Alphaproteobacteria dominated the bacterial community in the surface SCS, where the abundance of Betaproteobacteria was seemingly associated with climatic activity. Gammaproteobacteria thrived in the deep SCS, where a noticeable amount of Cyanobacteria were also detected. Marine Groups II and III Euryarchaeota were predominant in the archaeal communities in the surface and deep SCS, respectively. Bacterial diversity was higher than archaeal diversity at all sampling depths in the SCS, and peaked at mid-depths, agreeing with the diversity pattern found in global water columns. Metagenomic analysis not only showed differential %GC values and genome sizes between the surface and deep SCS, but also demonstrated depth-dependent metabolic potentials, such as cobalamin biosynthesis at 10 m, osmoregulation at 100 m, signal transduction at 1000 m, and plasmid and phage replication at 3000 m. When compared with other oceans, urease at 10 m and both exonuclease and permease at 3000 m were more abundant in the SCS. Finally, enriched genes associated with nutrient assimilation in the sea surface and transposase in the deep-sea metagenomes exemplified the functional zonation in global oceans. Conclusions: Prokaryotic communities in the SCS stratified with depth, with maximal bacterial diversity at mid-depth, in accordance with global water columns. The SCS had functional zonation among depths and endemically enriched metabolic potentials at the study site, in contrast to other oceans

Springer - Publisher Connector

Prokaryotic assemblages and metagenomes in pelagic zones of the South China Sea

Author: Ching-Hung Tseng
Chun-Mao Tseng
Fuh-Kwo Shiah
Hung-Chun Lai
Isaam Saeed
Liang-Saw Wen
Pei-Wen Chiang
Saman Halgamuge
Sen-Lin Tang
Ting-Chang Hsu
Wung-Yang Shieh
Yi-Lung Chen
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Comprehensive Insights Into Composition, Metabolic Potentials, and Interactions Among Archaeal, Bacterial, and Viral Assemblages in Meromictic Lake Shunet in Siberia

Author: Andrei Degermendzhi
Bayanmunkh Baatar
Bayanmunkh Baatar
Bayanmunkh Baatar
Cheng-Yu Yang
Ching-Hung Tseng
Ching-Hung Tseng
Denis Rogozin
Denis Rogozin
Hsiu-Hui Chiu
Isaam Saeed
Pei-Wen Chiang
Saman Halgamuge
Sen-Lin Tang
Sen-Lin Tang
Sen-Lin Tang
Yu-Ting Wu
Yu-Ting Wu
Publication venue: 'Frontiers Media SA'
Publication date: 01/08/2018
Field of study

Microorganisms are critical to maintaining stratified biogeochemical characteristics in meromictic lakes; however, their community composition and potential roles in nutrient cycling are not thoroughly described. Both metagenomics and metaviromics were used to determine the composition and capacity of archaea, bacteria, and viruses along the water column in the landlocked meromictic Lake Shunet in Siberia. Deep sequencing of 265 Gb and high-quality assembly revealed a near-complete genome corresponding to Nonlabens sp. sh3vir. in a viral sample and 38 bacterial bins (0.2–5.3 Mb each). The mixolimnion (3.0 m) had the most diverse archaeal, bacterial, and viral communities, followed by the monimolimnion (5.5 m) and chemocline (5.0 m). The bacterial and archaeal communities were dominated by Thiocapsa and Methanococcoides, respectively, whereas the viral community was dominated by Siphoviridae. The archaeal and bacterial assemblages and the associated energy metabolism were significantly related to the various depths, in accordance with the stratification of physicochemical parameters. Reconstructed elemental nutrient cycles of the three layers were interconnected, including co-occurrence of denitrification and nitrogen fixation in each layer and involved unique processes due to specific biogeochemical properties at the respective depths. According to the gene annotation, several pre-dominant yet unknown and uncultured bacteria also play potentially important roles in nutrient cycling. Reciprocal BLAST analysis revealed that the viruses were specific to the host archaea and bacteria in the mixolimnion. This study provides insights into the bacterial, archaeal, and viral assemblages and the corresponding capacity potentials in Lake Shunet, one of the three meromictic lakes in central Asia. Lake Shunet was determined to harbor specific and diverse viral, bacterial, and archaeal communities that intimately interacted, revealing patterns shaped by indigenous physicochemical parameters

Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition

Author: SAEED ISAAM
Publication venue
Publication date: 01/01/2011
Field of study

© 2011 Dr. Isaam SaeedTapping into the remarkable power of the uncultured majority of microbial organisms is the driving force of metagenomics. Metagenomics is the study of a microbial community’s genetic content when sampled directly from the environment. Given that microbial genomes within an environmental sample are fragmented prior to sequencing, the association of a genomic DNA fragment to its original genome is not known. As a result, the underlying population structure of the sampled microbial community is also unknown. While it is still possible to analyse the overall function of a microbial community, the functional roles of individual populations and the interactions between them cannot be examined. An approach to infer the underlying population structure of a metagenome is to group sequenced DNA fragments using common patterns in nucleotide base composition that are representative of a particular population (or a group of related populations). The primary challenges for any such method however are the taxonomic resolution and accuracy at which sequences are grouped. These are dependent on both the representation of patterns in DNA sequences and the method of grouping similar patterns. In this study, the oligonucleotide frequency derived error gradient (OFDEG), a novel representation of metagenomic sequences, is first proposed. In addition to grouping related metagenomic sequences, the OFDEG measure is also used to examine how patterns in base composition vary within a microbial genome. A model-based clustering framework is then developed to deal with the ambiguity and noise that affect the cluster distribution of patterns extracted from real-world metagenomic data. The concept of patterns in base composition is then extended to short metagenomic sequences (less than 1000 base-pairs in length), with the proposal of two novel representations based on dinucleotide frequency. The methods developed in this study are evaluated on simulated benchmark data sets and are shown to perform with greater accuracy and resolution than currently available methods. Further validation against publically available metagenomes produced results which were in accordance with reported analyses of sample diversity. Finally, the proposed methods are applied to four pyrosequenced metagenomic libraries of samples taken from a mud volcano in southwestern Taiwan. The inferred population structure and function were found to be consistent with complementary marker gene analysis as well as the local geochemistry of the sampling site