CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition
Authors
ISAAM SAEED
Publication date
1 January 2011
Publisher
Abstract
© 2011 Dr. Isaam SaeedTapping into the remarkable power of the uncultured majority of microbial organisms is the driving force of metagenomics. Metagenomics is the study of a microbial community’s genetic content when sampled directly from the environment. Given that microbial genomes within an environmental sample are fragmented prior to sequencing, the association of a genomic DNA fragment to its original genome is not known. As a result, the underlying population structure of the sampled microbial community is also unknown. While it is still possible to analyse the overall function of a microbial community, the functional roles of individual populations and the interactions between them cannot be examined. An approach to infer the underlying population structure of a metagenome is to group sequenced DNA fragments using common patterns in nucleotide base composition that are representative of a particular population (or a group of related populations). The primary challenges for any such method however are the taxonomic resolution and accuracy at which sequences are grouped. These are dependent on both the representation of patterns in DNA sequences and the method of grouping similar patterns. In this study, the oligonucleotide frequency derived error gradient (OFDEG), a novel representation of metagenomic sequences, is first proposed. In addition to grouping related metagenomic sequences, the OFDEG measure is also used to examine how patterns in base composition vary within a microbial genome. A model-based clustering framework is then developed to deal with the ambiguity and noise that affect the cluster distribution of patterns extracted from real-world metagenomic data. The concept of patterns in base composition is then extended to short metagenomic sequences (less than 1000 base-pairs in length), with the proposal of two novel representations based on dinucleotide frequency. The methods developed in this study are evaluated on simulated benchmark data sets and are shown to perform with greater accuracy and resolution than currently available methods. Further validation against publically available metagenomes produced results which were in accordance with reported analyses of sample diversity. Finally, the proposed methods are applied to four pyrosequenced metagenomic libraries of samples taken from a mud volcano in southwestern Taiwan. The inferred population structure and function were found to be consistent with complementary marker gene analysis as well as the local geochemistry of the sampling site
Similar works
Full text
Available Versions
University of Melbourne Institutional Repository
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:jupiter.its.unimelb.edu.au...
Last time updated on 06/01/2019