Search CORE

24,195 research outputs found

Systematic identification of gene families for use as markers for phylogenetic and phylogeny- driven ecological studies of bacteria and archaea and their major subgroups

Author: Eisen Jonathan A.
Jospin Guillaume
Wu Dongying
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 02/07/2013
Field of study

With the astonishing rate that the genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as marker genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities. We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa. We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for all bacteria and archaea, 114 for all bacteria, and much more for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.Comment: 24 pages, 3 figure

arXiv.org e-Print Archive

FigShare

Automated Protein Structure Classification: A Survey

Author: Hassanzadeh Oktie
Publication venue
Publication date: 01/01/2008
Field of study

Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront

arXiv.org e-Print Archive

CiteSeerX

PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data.

Author: Eisen Jonathan A
Green Jessica L
Kembel Steven W
Ladau Joshua
O'Dwyer James P
Pollard Katherine S
Riesenfeld Samantha J
Sharpton Thomas J
Publication venue: eScholarship, University of California
Publication date: 01/01/2011
Field of study

Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana).

Author: Delfino-Mix Annette
Famula Randi A
Gonzalez-Ibeas Daniel
Langley Charles H
Loopstra Carol A
Martinez-Garcia Pedro J
Neale David B
Stevens Kristian A
Wegrzyn Jill L
Publication venue: eScholarship, University of California
Publication date: 31/10/2016
Field of study

Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

Author: Akin
Altschul
Andersen
Andreasen
Aurilia
Barnum
Bartolomé
Bendtsen
Benner
Benoit
Benoit
Bhasin
Bhasin
Blum
Cai
Cai
Castanares
Chang
Choi
Crepin
D.B.R.K. Gupta Udatha
Dodd
Donaghy
Donaghy
Dudoit
Dysvik
Ewing
Faulds
Ferguson
Fillingham
Finn
Garcia-Conesa
García-Conesa
Garrigues
Gasteiger
Gasteiger
Gianni Panagiotou
Giuliani
Goldstone
Hall
Han
Hatzakis
Henikoff
Hermoso
Hsu
Humberstone
Huson
Irene Kouskoumvekaki
Kaiser
Karchin
Keerthi
Kheder
Kikuzaki
Kim
Kohavi
Kohonen
Koseki
Koseki
Kroon
Kroon
Kumar
Lao
Larkin
Laszlo
Latha
Lee
Lesage-Meessen
Levasseur
Levasseur
Li
Lima
Lisbeth Olsson
MacKay
Marcotte
McAuley
Meinicke
Morris
Mukherjee
Nielsen
Noble
Nsereko
Oili
Ong
Platt
Prates
Pérez-Bercoff
Rashamuse
Record
Rost
Sancho
Sankararaman
Sankararaman
Schrödinger Suite 2009
Schubot
Slavin
Tarbouriech
Teodoro
Thompson
Tomoko
Topakas
Topakas
Topakas
Topakas
Topakas
Tsuchiyama
Tsuchiyama
Uestuen
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Wang
Wang
Wang
Wilkinson
Publication venue
Publication date: 11/08/2010
Field of study

One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

Crossref

Chalmers Research

Nature Precedings

Online Research Database In Technology

Chalmers Publication Library

HKU Scholars Hub

Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

Author: Bourne Philip E
Scheeff Eric D
Publication venue: eScholarship, University of California
Publication date: 01/09/2006
Field of study

BackgroundOne of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.ResultsWe explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.ConclusionWhen attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used

PubMed Central

eScholarship - University of California