Search CORE

3,641 research outputs found

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

Author: Ahrén Dag
Darzentas Nikos
Goldovsky Leon
Kaipa Pallavi
Karp Peter D.
Kunin Victor
López-Bigas Núria
Moore-Kochlacs Caroline
Ouzounis Christos A.
Tsoka Sophia
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing

CiteSeerX

Lund University Publications

PubMed Central

King's Research Portal

Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods

Author: Jianlin Cheng
Jilong Li
Qi Qi
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods

Crossref

Springer - Publisher Connector

PubMed Central

Updates in metabolomics tools and resources: 2014-2015

Author: Misra Biswapriya B.
van der Hooft Justin
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

Enlighten

MetaCyc: a multiorganism database of metabolic pathways and enzymes

Author: Caspi Ron
Foerster Hartmut
Fulcher Carol A.
Hopkinson Rebecca
Ingraham John
Kaipa Pallavi
Karp Peter D.
Krummenacker Markus
Paley Suzanne
Pick John
Rhee Seung Y.
Tissier Christophe
Zhang Peifen
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

MetaCyc is a database of metabolic pathways and enzymes located at . Its goal is to serve as a metabolic encyclopedia, containing a collection of non-redundant pathways central to small molecule metabolism, which have been reported in the experimental literature. Most of the pathways in MetaCyc occur in microorganisms and plants, although animal pathways are also represented. MetaCyc contains metabolic pathways, enzymatic reactions, enzymes, chemical compounds, genes and review-level comments. Enzyme information includes substrate specificity, kinetic properties, activators, inhibitors, cofactor requirements and links to sequence and structure databases. Data are curated from the primary literature by curators with expertise in biochemistry and molecular biology. MetaCyc serves as a readily accessible comprehensive resource on microbial and plant pathways for genome analysis, basic research, education, metabolic engineering and systems biology. Querying, visualization and curation of the database is supported by SRI's Pathway Tools software. The PathoLogic component of Pathway Tools is used in conjunction with MetaCyc to predict the metabolic network of an organism from its annotated genome. SRI and the European Bioinformatics Institute employed this tool to create pathway/genome databases (PGDBs) for 165 organisms, available at the website. These PGDBs also include predicted operons and pathway hole fillers

CiteSeerX

Crossref

PubMed Central

Deep learning-based k(cat) prediction enables improved enzyme-constrained model reconstruction

Author: Chen Yu
Engqvist Martin
Kerkhoven Eduard
Li Feiran
Li Gang
Lu Hongzhong
Nielsen Jens B
Yuan Le
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Enzyme turnover numbers (k(cat)) are key to understanding cellular metabolism, proteome allocation and physiological diversity, but experimentally measured k(cat) data are sparse and noisy. Here we provide a deep learning approach (DLKcat) for high-throughput k(cat) prediction for metabolic enzymes from any organism merely from substrate structures and protein sequences. DLKcat can capture k(cat) changes for mutated enzymes and identify amino acid residues with a strong impact on k(cat) values. We applied this approach to predict genome-scale k(cat) values for more than 300 yeast species. Additionally, we designed a Bayesian pipeline to parameterize enzyme-constrained genome-scale metabolic models from predicted k(cat) values. The resulting models outperformed the corresponding original enzyme-constrained genome-scale metabolic models from previous pipelines in predicting phenotypes and proteomes, and enabled us to explain phenotypic differences. DLKcat and the enzyme-constrained genome-scale metabolic model construction pipeline are valuable tools to uncover global trends of enzyme kinetics and physiological diversity, and to further elucidate cellular metabolism on a large scale

Chalmers Research

Mapping and Filling Metabolic Pathway Holes

Author: Kaur Dipendra
Publication venue: ScholarWorks @ Georgia State University
Publication date: 21/04/2008
Field of study

The network-mapping tool integrated with protein database search can be used for filling pathway holes. A metabolic pathway under consideration (pattern) is mapped into a known metabolic pathway (text), to find pathway holes. Enzymes that do not show up in the pattern may be a hole in the pattern pathway or an indication of alternative pattern pathway. We present a data-mining framework for filling holes in the pattern metabolic pathway based on protein function, prosite scan and protein sequence homology. Using this framework we suggest several fillings found with the same EC notation, with group neighbors (enzymes with same EC number in first three positions, different in the fourth position), and instances where the function of an enzyme has been taken up by the left or right neighboring enzyme in the pathway. The percentile scores are better when closely related organisms are mapped as compared to mapping distantly related organisms

ScholarWorks @ Georgia State University

The CanOE Strategy: Integrating Genomic and Metabolic Contexts across Multiple Prokaryote Genomes to Find Candidate Genes for Orphan Enzymes

Author: A Aghaie
A Kreimeyer
A Osterman
Adam Alexander Thil Smith
Alain Viari
Christos A. Ouzounis
Claudine Medigue
D Che
D Petrey
D Szklarczyk
D Vallenet
D Vallenet
David Vallenet
E Cusa
EM Marcotte
EM Marcotte
Eugeni Belda
F Boyer
H Ogata
H Tigier
IM Keseler
JA Gerlt
JD Orth
K Postle
L Chen
L Ferrer
L Li
M Ashburner
M Green
M Kanehisa
M Magrane
M Pellegrini
ML Green
ML Green
N Fonknechten
NJ Mulder
O Lespinet
P Kharchenko
P Kharchenko
PA Srere
PD Karp
R Alcántara
R Bojanowski
R Caspi
R Overbeek
R Overbeek
R Overbeek
RJ Roberts
S Gama-Castro
VM Markowitz
WC Lathe 3rd
Y Chen
Y Li
Y Pouliot
Y Yamanishi
Y-P Denielou
Publication venue: Public Library of Science
Publication date: 01/05/2012
Field of study

Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes), a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short). The first step locates “genomic metabolons”, i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12

HAL Evry

Crossref

INRIA a CCSD electronic archive server

Directory of Open Access Journals

PubMed Central

HAL-CEA

University of Melbourne Institutional Repository

Hal-Diderot

FigShare

Methods for the refinement of genome-scale metabolic networks

Author: Liberal Fernandes Rodrigo
Publication venue: Division of Molecular Biosciences, Imperial College London
Publication date: 01/06/2013
Field of study

More accurate metabolic networks of pathogens and parasites are required to support the identification of important enzymes or transporters that could be potential targets for new drugs. The overall aim of this thesis is to contribute towards a new level of quality for metabolic network reconstruction, through the application of several different approaches. After building a draft metabolic network using an automated method, a large amount of manual curation effort is still necessary before an accurate model can be reached. PathwayBooster, a standalone software package, which I developed in Python, supports the first steps of model curation, providing easy access to enzymatic function information and a visual pathway display to enable the rapid identification of inaccuracies in the model. A major current problem in model refinement is the identification of genes encoding enzymes which are believed to be present but cannot be found using standard methods. Current searches for enzymes are mainly based on strong sequence similarity to proteins of known function, although in some cases it may be appropriate to consider more distant relatives as candidates for filling these pathway holes. With this objective in mind, a protocol was devised to search a proteome for superfamily relatives of a given enzymatic function, returning candidate enzymes to perform this function. Another, related approach tackles the problem of misannotation errors in public gene databases and their influence on metabolic models through the propagation of erroneous annotations. I show that the topological properties of metabolic networks contains useful information about annotation quality and can therefore play a role in methods for gene function assignment. An evolutionary perspective into functional changes within homologous domains opens up the possibility of integrating information from multiple genomes to support the reconstruction of metabolic models. I have therefore developed a methodology to predict functional change within a gene superfamily phylogeny

Spiral - Imperial College Digital Repository

Machine learning methods for metabolic pathway prediction

Author: Dale Joseph M
Karp Peter D
Popescu Liviu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central