Search CORE

39 research outputs found

The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences

Author: A Brady
A Valouev
AC McHardy
AC McHardy
Alice Carolyn McHardy
C Burge
DH Huson
F Meyer
F Sanger
F Warnecke
GL Rosen
GW Tyson
H Teeling
I Tsochantaridis
J Handelsman
K Mavromatis
Kaustubh Raosaheb Patil
KR Patil
KU Foerstner
Linus Roune
M Hess
M Margulies
ML Metzker
N Adams
P Hugenholtz
PB Pope
PJ Turnbaugh
R Sandberg
R Tewhey
S Karlin
Sarah K. Highlander
SF Altschul
W Gerlach
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Metagenome sequencing is becoming common and there is an increasing need for easily accessible tools for data analysis. An essential step is the taxonomic classification of sequence fragments. We describe a web server for the taxonomic assignment of metagenome sequences with PhyloPythiaS. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades. Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments. Here, we demonstrate usage of our web server by taxonomic assignment of metagenome samples from an acidophilic biofilm community of an acid mine and of a microbial community from cow rumen

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes.

Author: A Dufresne
A Garcia-Gonzalez
B Wang
CE McEwan
D Bharanidharan
David W. Ussery
EP Rocha
Eystein Skjerve
H Akaike
H Naya
H Willenbrock
J Bohlin
J Bohlin
J Bohlin
J Bohlin
J Lightfield
JJ Wernegreen
Jon Bohlin
JP McCutcheon
KT Konstantinidis
KU Foerstner
M Woolfit
N Molina
NA Moran
NA Moran
NA Moran
Ola Brynildsrud
ON Reva
PA Lind
PM Sharp
PM Sharp
PS Novichkov
R Hershberg
R Hershberg
R Mendez
R Raghavan
S Audic
S Mann
SA Marashi
SN Wood
T Banerjee
Tamir Tuller
Tammi Vesth
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

INTRODUCTION: Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. RESULTS: We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. CONCLUSION: Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

FigShare

TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

Author: A Campbell
A Ruepp
AC McHardy
Alexander Goesmann
B D
C Chan
D Huson
DL Wheeler
EV Koonin
F Sanger
G Salton
GW Tyson
H Teeling
H Teeling
J Bohlin
J Bohlin
J Brown
J Raes
JC Venter
JL Stein
Karsten Niehaus
KU Foerstner
L Krause
L Krause
Lutz Krause
M Margulies
MZZ Zhu
Naryttza N Diaz
P Baldi
PJ Keeling
R Finn
R Overbeek
R Sandberg
RD Fleischmann
S Garcia-Vallve
S Karlin
S Podell
S Saha
SF Altschul
SH Zhang
T Abe
T Abe
T Cover
T Hastie
Tim W Nattkemper
TN Tran
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10(1):56.Background: Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion: An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date. Background

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

University of Queensland eSpace

An Environment-Sensitive Synthetic Microbial Ecosystem

Author: A Ramette
AC Morán
AR Horswill
BE Rittmann
Bo Hu
C Jernberg
D Endy
D Karig
D Sprinzak
E Andrianantoandro
EL Haseltine
F Baquero
FK Balagadde
GM Church
H Hillebrand
H Kobayashi
HT Williams
HW Paerl
J Engebrecht
J Hasty
J Sticker
Jin Du
JS Chuang
JW Chin
K Brenner
KM Pappas
KU Foerstner
L Marucci
L Passador
L Serrano
L You
Mark Isalan
MB Elowitz
MJ Dunham
MR Parsek
NC Banning
NR Pace
P Marguet
PE Purnick
PN Bertin
Rui-yang Zou
S Basu
S Schauder
SD Costanzo
SK Hansen
TM Schmidt
TS Gardner
VC Kalia
W Jia
W Weber
WY Shou
Ying-jin Yuan
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Microbial ecosystems have been widely used in industrial production, but the inter-relationships of organisms within them haven't been completely clarified due to complex composition and structure of natural microbial ecosystems. So it is challenging for ecologists to get deep insights on how ecosystems function and interplay with surrounding environments. But the recent progresses in synthetic biology show that construction of artificial ecosystems where relationships of species are comparatively clear could help us further uncover the meadow of those tiny societies. By using two quorum-sensing signal transduction circuits, this research designed, simulated and constructed a synthetic ecosystem where various population dynamics formed by changing environmental factors. Coherent experimental data and mathematical simulation in our study show that different antibiotics levels and initial cell densities can result in correlated population dynamics such as extinction, obligatory mutualism, facultative mutualism and commensalism. This synthetic ecosystem provides valuable information for addressing questions in ecology and may act as a chassis for construction of more complex microbial ecosystems

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes

BACKGROUND: Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. METHODOLOGY/PRINCIPAL FINDINGS: We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. CONCLUSIONS: The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

Author: A Chao
A Chao
A Chao
A Chao
A Chao
AC McHardy
AE Magurran
AP Martin
B Rodriguez-Brito
C von Mering
CS Riesenfeld
DA Rasko
DB Rusch
DH Huson
DR Singleton
E Lerat
EF DeLong
EF Delong
GW Tyson
H Garcia Martin
H Teeling
H Teeling
JC Venter
JC Yue
JL Stein
Jo Handelsman
JP Wang
K Mavromatis
KP Burnham
KU Foerstner
L Excoffier
M Breitbart
M Breitbart
M Margulies
M Strous
MJ Anderson
MR Rondon
P Legendre
Patrick D Schloss
PD Schloss
PD Schloss
PD Schloss
PD Schloss
PD Schloss
PL Johnson
S Yooseph
SG Tringe
SJ Hallam
SJ Hallam
SR Gill
T Woyke
TD Read
TM Schmidt
VM Markowitz
W Ludwig
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data. Results Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments. Conclusion The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Secondary metabolite gene expression and interplay of bacterial functions in a tropical freshwater cyanobacterial bloom

Author: A Nishizawa
AI Saeed
C Straub
CN Shulse
D Haft
D Nonneman
D Tillett
DG Bourne
E Briand
E Dittmann
E Litchman
EA Ottesen
EA Welsh
F Haupt
FI Kappers
G Christiansen
G Pan
HW Paerl
I Stewart
I van Gremberghe
IM Ehrenreich
J Shao
J-F Humbert
Janelle R Thompson
JF Blom
Jia Wang
K Furukawa
K Gin
K Nakasugi
K Sellner
Kevin Penn
KU Foerstner
L Frangeul
L Kelly
M Di Lorenzo
M Jimbo
M Oliynyk
M Vila-Costa
M Yoshida
MA Johnson
MF Chislock
MF Watanabe
MJ Harke
MM Steffen
N Gan
N Ziemert
N Ziemert
N Ziemert
OV Kaluzhnaya
RA Mella-Herrera
S Meissner
S Pouria
Samodha C Fernando
SH Te
SM Gifford
T Nishizawa
T Rohrlack
T Rohrlack
T Rohrlack
T Rounge
W Fenical
WD Swingley
WW Carmichael
WW Carmichael
X Mou
Y Tanabe
Y Zilliges
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Cyanobacterial harmful algal blooms (cyanoHABs) appear to be increasing in frequency on a global scale. The Cyanobacteria in blooms can produce toxic secondary metabolites that make freshwater dangerous for drinking and recreation. To characterize microbial activities in a cyanoHAB, transcripts from a eutrophic freshwater reservoir in Singapore were sequenced for six samples collected over one day-night period. Transcripts from the Cyanobacterium Microcystis dominated all samples and were accompanied by at least 533 genera primarily from the Cyanobacteria, Proteobacteria, Bacteroidetes and Actinobacteria. Within the Microcystis population, abundant transcripts were from genes for buoyancy, photosynthesis and synthesis of the toxin microviridin, suggesting that these are necessary for competitive dominance in the Reservoir. During the day, Microcystis transcripts were enriched in photosynthesis and energy metabolism while at night enriched pathways included DNA replication and repair and toxin biosynthesis. Microcystis was the dominant source of transcripts from polyketide and non-ribosomal peptide synthase (PKS and NRPS, respectively) gene clusters. Unexpectedly, expression of all PKS/NRPS gene clusters, including for the toxins microcystin and aeruginosin, occurred throughout the day-night cycle. The most highly expressed PKS/NRPS gene cluster from Microcystis is not associated with any known product. The four most abundant phyla in the reservoir were enriched in different functions, including photosynthesis (Cyanobacteria), breakdown of complex organic molecules (Proteobacteria), glycan metabolism (Bacteroidetes) and breakdown of plant carbohydrates, such as cellobiose (Actinobacteria). These results provide the first estimate of secondary metabolite gene expression, functional partitioning and functional interplay in a freshwater cyanoHAB.Singapore. National Research Foundation (Singapore MIT Alliance for Research and Technology (SMART), Center for Environmental Sensing and Modeling (CENSAM) research program)National Science Foundation (U.S.) (Postdoctoral Research Fellowship in Biology, Grant No. DBI-1202865)National Institute of Environmental Health Sciences (NIEHS Grant P30-ES002109 to the MIT Center for Environmental Health Sciences)MIT International Science and Technology Initiatives (MISTI-Hayashi fund

DSpace@MIT

Crossref

PubMed Central

Differential preservation of endogenous human and microbial DNA in dental calculus and dentin

Dental calculus (calcified dental plaque) is prevalent in archaeological skeletal collections and is a rich source of oral microbiome and host-derived ancient biomolecules. Recently, it has been proposed that dental calculus may provide a more robust environment for DNA preservation than other skeletal remains, but this has not been systematically tested. In this study, shotgun-sequenced data from paired dental calculus and dentin samples from 48 globally distributed individuals are compared using a metagenomic approach. Overall, we find DNA from dental calculus is consistently more abundant and less contaminated than DNA from dentin. The majority of DNA in dental calculus is microbial and originates from the oral microbiome; however, a small but consistent proportion of DNA (mean 0.08 ± 0.08%, range 0.007–0.47%) derives from the host genome. Host DNA content within dentin is variable (mean 13.70 ± 18.62%, range 0.003–70.14%), and for a subset of dentin samples (15.21%), oral bacteria contribute \u3e 20% of total DNA. Human DNA in dental calculus is highly fragmented, and is consistently shorter than both microbial DNA in dental calculus and human DNA in paired dentin samples. Finally, we find that microbial DNA fragmentation patterns are associated with guanine-cytosine (GC) content, but not aspects of cellular structure

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Copenhagen University Research Information System

Leiden University Scholary Publications

NSU Works

MPG.PuRe

The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity

Author: A Dereeper
A Ginolhac
A Hornung
A Starcevic
AS Eustaquio
AS Eustáquio
B Shen
BO Bachmann
BS Moore
C Hertweck
C Rausch
C Rausch
CN Shulse
CP Ridley
D Tillett
DD Baker
DJ Edwards
DJ Newman
DW Udwary
E Cundliffe
E Gontang
EA Gontang
Eric Allen
G Yadav
GL Challis
H Ikeda
H Jenke-Kodama
H Jenke-Kodama
H Jenke-Kodama
H Jenke-Kodama
H Jenke-Kodama
J Davies
J Piel
JA Eisen
JA Eisen
JAV Blodgett
JB McAlpine
JD McPherson
JD Thompson
JM Winter
Jonathan H. Badger
JW Li
K Penn
KC Freel
Kevin Penn
KJ Weissman
KU Foerstner
L Du
M Margulies
M Metsa-Ketela
M Nett
MA Fischbach
MC Moffitt
MH Medema
MZ Ansari
N Roongsawang
Nadine Ziemert
Paul R. Jensen
PR Jensen
R Finking
RC Edgar
RD Finn
S Guindon
S Lautru
S Lautru
SA Sieber
SC Wenzel
SD Bentley
SF Altschul
SG Tringe
Sheila Podell
SJ Moss
SMD Goldberg
T Junier
T Nguyen
Valerie de Crécy-Lagard
WP Maddison
Z Chang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Relative amino acid composition signatures of organisms and environments

Author: A Bernal
A Quigg
AC Redfield
Alexandra Moura
AR Chowdhury
CC Cleveland
Christos A. Ouzounis
D Jollivet
EP Rocha
F Tekaia
F Tekaia
H Akashi
H Ogata
J Lightfield
J Pramanik
JF Curran
JI Glass
KD Pruitt
KU Foerstner
M Botzman
Michael A. Savageau
P Baudouin-Cornu
PG Taylor
PM Sharp
R Alves
RD Knight
Rui Alves
S Karlin
S Karlin
S Karlin
S Karlin
S Klumpp
S Ledoux
SA Sanudo-Wilhelmy
SG Andersson
T Okayasu
T Weber
TS Weber
U Sauer
WB Whitman
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

BACKGROUND: Identifying organism-environment interactions at the molecular level is crucial to understanding how organisms adapt to and change the chemical and molecular landscape of their habitats. In this work we investigated whether relative amino acid compositions could be used as a molecular signature of an environment and whether such a signature could also be observed at the level of the cellular amino acid composition of the microorganisms that inhabit that environment. METHODOLOGIES/PRINCIPAL FINDINGS: To address these questions we collected and analyzed environmental amino acid determinations from the literature, and estimated from complete genomic sequences the global relative amino acid abundances of organisms that are cognate to the different types of environment. Environmental relative amino acid abundances clustered into broad groups (ocean waters, host-associated environments, grass land environments, sandy soils and sediments, and forest soils), indicating the presence of amino acid signatures specific for each environment. These signatures correlate to those found in organisms. Nevertheless, relative amino acid abundance of organisms was more influenced by GC content than habitat or phylogeny. CONCLUSIONS: Our results suggest that relative amino acid composition can be used as a signature of an environment. In addition, we observed that the relative amino acid composition of organisms is not highly determined by environment, reinforcing previous studies that find GC content to be the major factor correlating to amino acid composition in living organisms.AM was supported by Fundação para a Ciência e a Tecnologia, Portugal, through the postdoctoral grant SFRH/BPD/72256/2010. RA was partially supported by the Ministerio de Ciencia e Innovación (Spain) through grant BFU2010-17704, and by the Generalitat de Catalunya through a grant for research group 2009SGR809. MAS was supported in part by a grant from the US Public Health Service (RO1-GM30054). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Authors wish to thank Albert Sorribas, Enrique Herrero and Ester Vilaprinyo for critical reading of the manuscript and Ester Vilaprinyo for assistance with Wolfram Mathematica software.publishe

CiteSeerX

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da Universidade de Aveiro

Directory of Open Access Journals

PubMed Central

Repositori Obert UdL

FigShare