Search CORE

MetaMine – A tool to detect and analyse gene patterns in their environmental context

Author: A Alexeyenko
A Bateman
A Enright
A Meyerdierks
B Jørgensen
B Snel
C von Mering
ED Harrington
Frank O Glöckner
GW Tyson
I Jonassen
I Jonassen
I Mandoiu
I Rigoutsos
J Boekhorst
JC Venter
M Hu
MA Moran
MA Moran
MA Moran
MPP Béal
N Luc
R Finn
R Overbeek
R Overbeek
Renzo Kottmann
RK Aziz
RL Tatusov
S Altschul
S Giovannoni
S Hallam
S Yooseph
SG Tringe
SJH Kim
T Lombardot
T Lombardot
Thierry Lombardot
Uta Bohnebeck
V Markowitz
V Markowitz
VM Markowitz
X He
Y Ye
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background Modern sequencing technologies allow rapid sequencing and bioinformatic analysis of genomes and metagenomes. With every new sequencing project a vast number of new proteins become available with many genes remaining functionally unclassified based on evidences from sequence similarities alone. Extending similarity searches with gene pattern approaches, defined as genes sharing a distinct genomic neighbourhood, have shown to significantly improve the number of functional assignments. Further functional evidences can be gained by correlating these gene patterns with prevailing environmental parameters. MetaMine was developed to approach the large pool of unclassified proteins by searching for recurrent gene patterns across habitats based on key genes. Results MetaMine is an interactive data mining tool which enables the detection of gene patterns in an environmental context. The gene pattern search starts with a user defined environmentally interesting key gene. With this gene a BLAST search is carried out against the Microbial Ecological Genomics DataBase (MEGDB) containing marine genomic and metagenomic sequences. This is followed by the determination of all neighbouring genes within a given distance and a search for functionally equivalent genes. In the final step a set of common genes present in a defined number of distinct genomes is determined. The gene patterns found are associated with their individual pattern instances describing gene order and directions. They are presented together with information about the sample and the habitat. MetaMine is implemented in Java and provided as a client/server application with a user-friendly graphical user interface. The system was evaluated with environmentally relevant genes related to the methane-cycle and carbon monoxide oxidation. Conclusion MetaMine offers a targeted, semi-automatic search for gene patterns based on expert input. The graphical user interface of MetaMine provides a user-friendly overview of the computed gene patterns for further inspection in an ecological context. Prevailing biological processes associated with a key gene can be used to infer new annotations and shape hypotheses to guide further analyses. The use-cases demonstrate that meaningful gene patterns can be quickly detected using MetaMine

Springer - Publisher Connector

MPG.PuRe

The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes

The release of the 1000(th) complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms

PDXScholar (Portland State University)

eScholarship - University of California

Publications at Bielefeld University

Open Repository and Bibliography - Luxembourg

University of Queensland eSpace

ComPath: comparative enzyme analysis and annotation in pathway/subsystem contexts

Author: A Andreeva
A Bateman
A Marchler-Bauer
A Osterman
AL Barabási
C Gene Ontology
C The UniProt
CJA Sigrist
CM Zmasek
DA Benson
DH Haft
HM Berman
HW Ma
J Wu
K Choi
Kwangmin Choi
L Pireddu
M Kanehisa
M Kanehisa
M Madera
N Hulo
P Stothard
PC Babbitt
PD Karp
R Caspi
R Overbeek
RA George
S Kim
S Kim
S Kim
S Kim
SCH Pegg
SF Altschul
Sun Kim
V BATAGELJL
VM Markowitz
W Thompson
WR Pearson
Y Ye
Y Zheng
YI Wolf
Publication venue: BioMed Central
Publication date: 01/03/2008
Field of study

Abstract Background Once a new genome is sequenced, one of the important questions is to determine the presence and absence of biological pathways. Analysis of biological pathways in a genome is a complicated task since a number of biological entities are involved in pathways and biological pathways in different organisms are not identical. Computational pathway identification and analysis thus involves a number of computational tools and databases and typically done in comparison with pathways in other organisms. This computational requirement is much beyond the capability of biologists, so information systems for reconstructing, annotating, and analyzing biological pathways are much needed. We introduce a new comparative pathway analysis workbench, ComPath, which integrates various resources and computational tools using an interactive spreadsheet-style web interface for reliable pathway analyses. Results ComPath allows users to compare biological pathways in multiple genomes using a spreadsheet style web interface where various sequence-based analysis can be performed either to compare enzymes (e.g. sequence clustering) and pathways (e.g. pathway hole identification), to search a genome for <it>de novo </it>prediction of enzymes, or to annotate a genome in comparison with reference genomes of choice. To fill in pathway holes or make <it>de novo </it>enzyme predictions, multiple computational methods such as FASTA, Whole-HMM, CSR-HMM (a method of our own introduced in this paper), and PDB-domain search are integrated in ComPath. Our experiments show that FASTA and CSR-HMM search methods generally outperform Whole-HMM and PDB-domain search methods in terms of sensitivity, but FASTA search performs poorly in terms of specificity, detecting more false positive as E-value cutoff increases. Overall, CSR-HMM search method performs best in terms of both sensitivity and specificity. Gene neighborhood and pathway neighborhood (global network) visualization tools can be used to get context information that is complementary to conventional KEGG map representation. Conclusion ComPath is an interactive workbench for pathway reconstruction, annotation, and analysis where experts can perform various sequence, domain, context analysis, using an intuitive and interactive spreadsheet-style interface. </p

Springer - Publisher Connector

ModEnzA: Accurate Identification of Metabolic Enzymes Using Function Specific Profile HMMs with Optimised Discrimination Threshold and Modified Emission Probabilities

Author: Desai Dhwani K.
Lynn Andrew M.
Nandi Soumyadeep
Srivastava Prashant K.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2011
Field of study

Various enzyme identification protocols involving homology transfer by sequence-sequence or profile-sequence comparisons have been devised which utilise Swiss-Prot sequences associated with EC numbers as the training set. A profile HMM constructed for a particular EC number might select sequences which perform a different enzymatic function due to the presence of certain fold-specific residues which are conserved in enzymes sharing a common fold. We describe a protocol, ModEnzA (HMM-ModE Enzyme Annotation), which generates profile HMMs highly specific at a functional level as defined by the EC numbers by incorporating information from negative training sequences. We enrich the training dataset by mining sequences from the NCBI Non-Redundant database for increased sensitivity. We compare our method with other enzyme identification methods, both for assigning EC numbers to a genome as well as identifying protein sequences associated with an enzymatic activity. We report a sensitivity of 88% and specificity of 95% in identifying EC numbers and annotating enzymatic sequences from the E. coli genome which is higher than any other method. With the next-generation sequencing methods producing a huge amount of sequence data, the development and use of fully automated yet accurate protocols such as ModEnzA is warranted for rapid annotation of newly sequenced genomes and metagenomic sequences

How do we compare hundreds of bacterial genomes

Author: Christopher Van Der Gast
Dawn Field
Gareth Wilson
Publication venue
Publication date: 23/04/2020
Field of study

The genomic revolution is fully upon us in 2006 and the pace of discovery is set to accelerate with the emergence of ultra-highthroughput sequencing technologies. Our complete genome collection of bacteria and archaea continues to grow in number and diversity, as genome sequencing is applied to an array of new problems, from the characterization of the pan-genome to the detection of mutation after experimentation and the exploration of microbial communities in unprecedented detail. The benefits of large-scale comparative genomic analyses are driving the community to think about how to manage our public collections of genomes in novel ways

CiteSeerX

Moonlighting Proteins Hal3 and Vhs3 Form a Heteromeric PPCDC with Ykl088w in Yeast CoA Biosynthesis

Author: A Albert
A Espinosa-Ruiz
A Ferrando
A Osterman
A Ruiz
A Ruiz
AC Gavin
AC Mercer
Amparo Ruiz
Asier González
B Dujon
C Costigan
C Gancedo
CJ Di Como
CR Meyer
DA Treco
E de Nadal
E de Nadal
E Strauss
E Strauss
E Strauss
E Strauss
ED Spitzer
ED Spitzer
Erick Strauss
ET Bucovaz
ET Bucovaz
F Posas
G Giaever
I Munoz
I Munoz
Ivan Muñoz
J Albert Abrie
J Ariño
J Clotet
J Olzhausen
JH Choi
Joaquín Ariño
KS Lee
L Yenush
M Daugherty
MA Garcia-Gimeno
MT Brown
NJ Krogan
P Hernandez-Acosta
Raquel Serrano
S Merchan
S Steinbacher
SR Collins
T Kupke
T Kupke
T Kupke
T Kupke
TP Begley
TR Hazbun
Y Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Premi a l'excel·lència investigadora. 2010Unlike most other organisms, the essential five-step Coenzyme A biosynthetic pathway has not been fully resolved in yeast. Specifically, the gene(s) encoding the phosphopantothenoylcysteine decarboxylase (PPCDC) activity still remains unidentified. Sequence homology analyses suggest three candidates, namely Ykl088w, Hal3 and Vhs3, as putative PPCDC enzymes in Saccharomyces cerevisiae. Interestingly, Hal3 and Vhs3 have been characterized as negative regulatory subunits of the Ppz1 protein phosphatase. Here we show that YKL088w does not encode a third Ppz1 regulatory subunit, and that the essential roles of Ykl088w and the Hal3/Vhs3 pair are complementary, cannot be interchanged and can be attributed to PPCDC-related functions. We demonstrate that while known eukaryotic PPCDCs are homotrimers, the active yeast enzyme is a heterotrimer which consists of Ykl088w and Hal3/Vhs3 monomers that separately provides two essential catalytic residues. Our results unveil Hal3/Vhs3 as moonlighting proteins, involved in both CoA biosynthesis and protein phosphatase regulation

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

Machine learning methods for metabolic pathway prediction

Author: Dale Joseph M
Karp Peter D
Popescu Liviu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.</p

Springer - Publisher Connector

A novel immunity system for bacterial nucleic acid degrading toxins and its recruitment in various eukaryotic and DNA viral systems

Author: Alouf
Altschul
Anantharaman
Andreeva
Aoki
Aoki
Aravind
Aravind
Bai
Basmaji
Beckmann
Budt
Burglin
Burroughs
Carr
Cascales
Cassady
Cenciarelli
Child
Child
Child
Conticello
Cuff
Dagkessamanskaia
Dagkessamanskaia
Dalal
Dapeng Zhang
Das
de Souza
Delattre
Dhananjaya
Dugatkin
Edgar
Endo
Endo
Engelberg-Kulka
Finn
Ghosh
Goodstadt
Hakki
Hakki
Hall
Hamill
Hayes
Holm
Hong
Humphrey
Iyer
Iyer
Iyer
Iyer
Jackson
Jacob-Dubuisson
Janke
Jensen
Jones
Kall
Kawano
Kobayashi
Krishna
Krogh
L. Aravind
Lakshminarayan M. Iyer
Lassmann
Makhov
Marshall
Matsuo
McCormick
Menard
Moorman
Mulec
Osbourn
Pallen
Pallen
Pei
Pei
Perler
Peterson
Price
Renzi
Ricagno
Riley
Roberts
Rosenberg
Sali
Shima
Shimoike
Shlyapnikov
Soding
Sokolowska
Stirpe
Tan
Terhune
Tukachinsky
Valchanova
Velikovsky
Wang
Wang
Wootton
Xuan
Yamamoto
Ye
Publication venue: Oxford University Press
Publication date
Field of study

The use of nucleases as toxins for defense, offense or addiction of selfish elements is widely encountered across all life forms. Using sensitive sequence profile analysis methods, we characterize a novel superfamily (the SUKH superfamily) that unites a diverse group of proteins including Smi1/Knr4, PGs2, FBXO3, SKIP16, Syd, herpesviral US22, IRS1 and TRS1, and their bacterial homologs. Using contextual analysis we present evidence that the bacterial members of this superfamily are potential immunity proteins for a variety of toxin systems that also include the recently characterized contact-dependent inhibition (CDI) systems of proteobacteria. By analyzing the toxin proteins encoded in the neighborhood of the SUKH superfamily we predict that they possess domains belonging to diverse nuclease and nucleic acid deaminase families. These include at least eight distinct types of DNases belonging to HNH/EndoVII- and restriction endonuclease-fold, and RNases of the EndoU-like and colicin E3-like cytotoxic RNases-folds. The N-terminal domains of these toxins indicate that they are extruded by several distinct secretory mechanisms such as the two-partner system (shared with the CDI systems) in proteobacteria, ESAT-6/WXG-like ATP-dependent secretory systems in Gram-positive bacteria and the conventional Sec-dependent system in several bacterial lineages. The hedgehog-intein domain might also release a subset of toxic nuclease domains through auto-proteolytic action. Unlike classical colicin-like nuclease toxins, the overwhelming majority of toxin systems with the SUKH superfamily is chromosomally encoded and appears to have diversified through a recombination process combining different C-terminal nuclease domains to N-terminal secretion-related domains. Across the bacterial superkingdom these systems might participate in discriminating `self’ or kin from `non-self’ or non-kin strains. Using structural analysis we demonstrate that the SUKH domain possesses a versatile scaffold that can be used to bind a wide range of protein partners. In eukaryotes it appears to have been recruited as an adaptor to regulate modification of proteins by ubiquitination or polyglutamylation. Similarly, another widespread immunity protein from these toxin systems, namely the suppressor of fused (SuFu) superfamily has been recruited for comparable roles in eukaryotes. In animal DNA viruses, such as herpesviruses, poxviruses, iridoviruses and adenoviruses, the ability of the SUKH domain to bind diverse targets has been deployed to counter diverse anti-viral responses by interacting with specific host proteins