Search CORE

9 research outputs found

Genomic fluidity: an integrative view of gene diversity within microbial populations

Author: AJ Lee
AK Shaw
Andrey O Kislyuk
AO Kislyuk
B Efron
BA Flusberg
Bart Haegeman
BE Stranger
C Fraser
C Schoen
D Gevers
DA Rasko
ER Mardis
ES Lander
F Wright
G D'Auria
G Kudla
H Tettelin
H Tettelin
J Qin
J Shendure
JCD Hotopp
Joshua S Weitz
JP Gogarten
JS Hogg
KE Holt
KT Konstantinidis
KT Konstantinidis
L Snipen
M Achtman
M Wu
MB Sullivan
ML Reno
N Ahmed
Nicholas H Bergman
NJ Gotelli
NL Hiller
P Lapierre
PE Chen
R Redon
SD Bentley
SF Altschul
SJ Callister
VJ Denef
WF Doolittle
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The dual concepts of pan and core genomes have been widely adopted as means to assess the distribution of gene families within microbial species and genera. The core genome is the set of genes shared by a group of organisms; the pan genome is the set of all genes seen in any of these organisms. A variety of methods have provided drastically different estimates of the sizes of pan and core genomes from sequenced representatives of the same groups of bacteria. Results We use a combination of mathematical, statistical and computational methods to show that current predictions of pan and core genome sizes may have no correspondence to true values. Pan and core genome size estimates are problematic because they depend on the estimation of the occurrence of rare genes and genomes, respectively, which are difficult to estimate precisely because they are rare. Instead, we introduce and evaluate a robust metric - genomic fluidity - to categorize the gene-level similarity among groups of sequenced isolates. Genomic fluidity is a measure of the dissimilarity of genomes evaluated at the gene level. Conclusions The genomic fluidity of a population can be estimated accurately given a small number of sequenced genomes. Further, the genomic fluidity of groups of organisms can be compared robustly despite variation in algorithms used to identify genes and their homologs. As such, we recommend that genomic fluidity be used in place of pan and core genome size estimates when assessing gene diversity within genomes of a species or a group of closely related organisms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

ProdInra

A computational genomics pipeline for prokaryotic sequencing projects

Author: Altschul
Andrew B. Conley
Andrey O. Kislyuk
Aziz
Bendtsen
Bentley
Besemer
Boeckmann
Brian H. Harcourt
Chen
Chen
Darling
Delcher
Dhwani Govil
Eid
Fleischmann
Gerlach
Holmes
Hotopp
I. King Jordan
Jay C. Humphrey
Jolley
Kathleen M. Tatti
Kislyuk
Krogh
Kroll
Kuo
Lapierre
Lee S. Katz
Leonard W. Mayer
Lowe
MacCallum
Maiden
Margulies
Maria L. Tondella
Markowitz
Matthew S. Hagen
Meyers
Miller
Mulder
Parkhill
Perrin
Pop
Pushkala Jayaraman
Quinlan
Raydel D. Mair
Rissman
Rosenstein
Schoen
Scott A. Sammons
Seshadri
Shendure
Sommer
Sonia Agrawal
Stewart
Tettelin
Uniprot Consortium
Viswateja Nelakuditi
Yang
Zerbino
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data

Crossref

PubMed Central

Algorithm development for next generation sequencing-based metagenome analysis

Author: Kislyuk Andrey O.
Publication venue: Georgia Institute of Technology
Publication date: 26/08/2010
Field of study

We present research on the design, development and application of algorithms for DNA sequence analysis, with a focus on environmental DNA (metagenomes). We present an overview and primer on algorithm development for bioinformatics of metagenomes; work on frameshift detection in DNA sequencing data; work on a computational pipeline for the assembly, feature prediction, annotation and analysis of bacterial genomes; work on unsupervised phylogenetic clustering of metagenomic fragments using Markov Chain Monte Carlo methods; and work on estimation of bacterial genome plasticity and diversity, potential improvements to the measures of core and pan-genomes.PhDCommittee Chair: Weitz, Joshua; Committee Co-Chair: Jordan, I. King; Committee Member: Bader, David; Committee Member: Bergman, Nicholas; Committee Member: Chernoff, Yur

Scholarly Materials And Research @ Georgia Tech

Neisseria Base: a comparative genomics database for Neisseria meningitidis

Author: Altschul
Andrew B. Conley
Andrey O. Kislyuk
Aurrecoechea
Bendtsen
Bentley
Bieri
Bilukha
Brian H. Harcourt
Chen
Clamp
Cohn
Darling
Dehal
Drysdale
Edgar
Elsik
Flicek
Geoffroy
I. King Jordan
Jay C. Humphrey
Jolley
Joseph
Kent
Kislyuk
Krogh
Lander
Lee S. Katz
Leonard W. Mayer
Maiden
Maiden
Margulies
Melissa A. Olsen-Rasmussen
Michael Frace
Mulder
Nitya V. Sharma
Parkhill
Peng
Pop
Pushkala Jayaraman
Rosenstein
Rusniok
Sayers
Schmink
Schoen
Sonia Agrawal
Stajich
Stein
Tettelin
Uniprot Consortium
Viswateja Nelakuditi
Wang
Waterhouse
Yang
Yazdankhah
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Unsupervised statistical clustering of environmental shotgun sequences

Author: A Campbell
AC Mchardy
Andrey Kislyuk
BB Ward
CK Chan
CKK Chan
D Sorensen
DB Rusch
DH Huson
DJ Lane
DR Bentley
EA Dinsdale
F Not
F Warnecke
FE Angly
G Muyzer
GW Tyson
H Teeling
H Teeling
J Handelsman
J Shendure
Jonathan Dushoff
Joshua S Weitz
JP Noonan
K Mavromatis
M Margulies
ML Sogin
O Béjà
PJ Deschavanne
S Chatterji
S Kariin
SG Tringe
SM Huse
SR Gill
Srijak Bhatnagar
T Abe
T Woyke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

© 2009 Kislyuk et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/10/316DOI: 10.1186/1471-2105-10-316Background: The development of effective environmental shotgun sequence binning methods remains an ongoing challenge in algorithmic analysis of metagenomic data. While previous methods have focused primarily on supervised learning involving extrinsic data, a first-principles statistical model combined with a self-training fitting method has not yet been developed. Results: We derive an unsupervised, maximum-likelihood formalism for clustering short sequences by their taxonomic origin on the basis of their k-mer distributions. The formalism is implemented using a Markov Chain Monte Carlo approach in a k-mer feature space. We introduce a space transformation that reduces the dimensionality of the feature space and a genomic fragment divergence measure that strongly correlates with the method's performance. Pairwise analysis of over 1000 completely sequenced genomes reveals that the vast majority of genomes have sufficient genomic fragment divergence to be amenable for binning using the present formalism. Using a highperformance implementation, the binner is able to classify fragments as short as 400 nt with accuracy over 90% in simulations of low-complexity communities of 2 to 10 species, given sufficient genomic fragment divergence. The method is available as an open source package called LikelyBin. Conclusion: An unsupervised binning method based on statistical signatures of short environmental sequences is a viable stand-alone binning method for low complexity samples. For medium and high complexity samples, we discuss the possibility of combining the current method with other methods as part of an iterative process to enhance the resolving power of sorting reads into taxonomic and/or functional bins

Scholarly Materials And Research @ Georgia Tech

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central