49 research outputs found
Predicting genes for orphan metabolic activities using phylogenetic profiles
Homology-based methods fail to assign genes to many metabolic activities present in sequenced organisms. To suggest genes for these orphan activities we developed a novel method that efficiently combines local structure of a metabolic network with phylogenetic profiles. We validated our method using known metabolic genes in Saccharomyces cerevisiae and Escherichia coli. We show that our method should be easily transferable to other organisms, and that it is robust to errors in incomplete metabolic networks
Recommended from our members
Influence of metabolic network structure and function on enzyme evolution
BACKGROUND: Most studies of molecular evolution are focused on individual genes and proteins. However, understanding the design principles and evolutionary properties of molecular networks requires a system-wide perspective. In the present work we connect molecular evolution on the gene level with system properties of a cellular metabolic network. In contrast to protein interaction networks, where several previous studies investigated the molecular evolution of proteins, metabolic networks have a relatively well-defined global function. The ability to consider fluxes in a metabolic network allows us to relate the functional role of each enzyme in a network to its rate of evolution. RESULTS: Our results, based on the yeast metabolic network, demonstrate that important evolutionary processes, such as the fixation of single nucleotide mutations, gene duplications, and gene deletions, are influenced by the structure and function of the network. Specifically, central and highly connected enzymes evolve more slowly than less connected enzymes. Also, enzymes carrying high metabolic fluxes under natural biological conditions experience higher evolutionary constraints. Genes encoding enzymes with high connectivity and high metabolic flux have higher chances to retain duplicates in evolution. In contrast to protein interaction networks, highly connected enzymes are no more likely to be essential compared to less connected enzymes. CONCLUSION: The presented analysis of evolutionary constraints, gene duplication, and essentiality demonstrates that the structure and function of a metabolic network shapes the evolution of its enzymes. Our results underscore the need for systems-based approaches in studies of molecular evolution
The rate of the molecular clock and the cost of gratuitous protein synthesis
The nature of the protein molecular clock, the protein-specific rate of amino acid substitutions, is among the central questions of molecular evolution. Protein expression level is the dominant determinant of the clock rate in a number of organisms. It has been suggested that highly expressed proteins evolve slowly in all species mainly to maintain robustness to translation errors that generate toxic misfolded proteins. Here we investigate this hypothesis experimentally by comparing the growth rate of Escherichia coli expressing wild type and misfolding-prone variants of the LacZ protein. We show that the cost of toxic protein misfolding is small compared to other costs associated with protein synthesis. Complementary computational analyses demonstrate that there is also a relatively weaker, but statistically significant, selection for increasing solubility and polarity in highly expressed E. coli proteins. Although we cannot rule out the possibility that selection against misfolding toxicity significantly affects the protein clock in species other than E. coli, our results suggest that it is unlikely to be the dominant and universal factor determining the clock rate in all organisms. We find that in this bacterium other costs associated with protein synthesis are likely to play an important role. Interestingly, our experiments also suggest significant costs associated with volume effects, such as jamming of the cellular environment with unnecessary proteins
Recommended from our members
The Amino-Acid Mutational Spectrum of Human Genetic Disease
Background: Nonsynonymous mutations in the coding regions of human genes are responsible
for phenotypic differences between humans and for susceptibility to genetic disease.
Computational methods were recently used to predict deleterious effects of nonsynonymous
human mutations and polymorphisms. Here we focus on understanding the amino-acid mutation
spectrum of human genetic disease. We compare the disease spectrum to the spectra of mutual
amino-acid mutation frequencies, non-disease polymorphisms in human genes, and substitutions
fixed between species.
Results: We find that the disease spectrum correlates well with the amino-acid mutation
frequencies based on the genetic code. Normalized by the mutation frequencies, the spectrum can
be rationalized in terms of chemical similarities between amino acids. The disease spectrum is
almost identical for membrane and non-membrane proteins. Mutations at arginine and glycine
residues are together responsible for about 30% of genetic diseases, whereas random mutations at
tryptophan and cysteine have the highest probability of causing disease.
Conclusions: The overall disease spectrum mainly reflects the mutability of the genetic code. We
corroborate earlier results that the probability of a nonsynonymous mutation causing a genetic
disease increases monotonically with an increase in the degree of evolutionary conservation of the
mutation site and a decrease in the solvent-accessibility of the site; opposite trends are observed
for non-disease polymorphisms. We estimate that the rate of nonsynonymous mutations with a
negative impact on human health is less than one per diploid genome per generation
Computational prediction and experimental verification of the gene encoding the NAD � /NADP � - dependent succinate semialdehyde dehydrogenase in Escherichia coli
Although NAD �-dependent succinate semialdehyde dehydrogenase activity was first described in Escherichia coli more than 25 years ago, the responsible gene has remained elusive so far. As an experimental proof of concept for a gap-filling algorithm for metabolic networks developed earlier, we demonstrate here that the E. coli gene yneI is responsible for this activity. Our biochemical results demonstrate that the yneI-encoded succinate semialdehyde dehydrogenase can use either NAD � or NADP � to oxidize succinate semialdehyde to succinate. The gene is induced by succinate semialdehyde, and expression data indicate that yneI plays a unique physiological role in the general nitrogen metabolism of E. coli. In particular, we demonstrate using mutant growth experiments that the yneI gene has an important, but not essential, role during growth on arginine and probably has an essential function during growth on putrescine as the nitrogen source. The NADP �-dependent succinate semialdehyde dehydrogenase activity encoded by the functional homolog gabD appears to be important for nitrogen metabolism under N limitation conditions. The yneI-encoded activity, in contrast, functions primarily as a valve to prevent toxic accumulation of succinate semialdehyde. Analysis of available genome sequences demonstrated that orthologs of both yneI and gabD are broadly distributed across phylogenetic space. In spite of extensive biochemical research, metabolic network
Recommended from our members
Identifying metabolic enzymes with multiple types of association evidence
BACKGROUND: Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. RESULTS: We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. CONCLUSION: We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities
Expression dynamics of a cellular metabolic network
Toward the goal of understanding system properties of biological networks, we investigate the global and local regulation of gene expression in the Saccharomyces cerevisiae metabolic network. Our results demonstrate predominance of local gene regulation in metabolism. Metabolic genes display significant coexpression on distances smaller than the average network distance, a behavior supported by the distribution of transcription factor binding sites in the metabolic network and genome context associations. Positive gene coexpression decreases monotonically with distance in the network, while negative coexpression is strongest at intermediate network distances. We show that basic topological motifs of the metabolic network exhibit statistically significant differences in coexpression behavior
Properties of cell death models calibrated and compared using Bayesian approaches
Using models to simulate and analyze biological networks requires principled approaches to parameter estimation and model discrimination. We use Bayesian and Monte Carlo methods to recover the full probability distributions of free parameters (initial protein concentrations and rate constants) for mass-action models of receptor-mediated cell death. The width of the individual parameter distributions is largely determined by non-identifiability but covariation among parameters, even those that are poorly determined, encodes essential information. Knowledge of joint parameter distributions makes it possible to compute the uncertainty of model-based predictions whereas ignoring it (e.g., by treating parameters as a simple list of values and variances) yields nonsensical predictions. Computing the Bayes factor from joint distributions yields the odds ratio (~20-fold) for competing ‘direct’ and ‘indirect’ apoptosis models having different numbers of parameters. Our results illustrate how Bayesian approaches to model calibration and discrimination combined with single-cell data represent a generally useful and rigorous approach to discriminate between competing hypotheses in the face of parametric and topological uncertainty
Reconstruction and flux-balance analysis of the Plasmodium falciparum metabolic network
In the paper we present a metabolic reconstruction and flux-balance analysis (FBA) of Plasmodium falciparum, the primary agent of malaria. The compartmentalized metabolic network of the parasite accounts for 1001 reactions and 616 metabolites. Enzyme–gene associations were established for 366 genes and 75% of all enzymatic reactions.The model was able to reproduce phenotypes of experimental gene knockout and drug inhibition assays with up to 90% accuracy. The model also can be used to efficiently integrate mRNA-expression data to improve the accuracy of metabolic predictions.Using FBA of the reconstructed metabolic network, we identified 40 enzymatic drug targets (i.e. in silico essential genes) with no or very low sequence identity to human proteins.We experimentally tested one of the identified drug targets, nicotinate mononucleotide adenylyltransferase, using a recently discovered small-molecule inhibitor
Role of Duplicate Genes in Robustness against Deleterious Human Mutations
It is now widely recognized that robustness is an inherent property of biological systems [1],[2],[3]. The contribution of close sequence homologs to genetic robustness against null mutations has been previously demonstrated in simple organisms [4],[5]. In this paper we investigate in detail the contribution of gene duplicates to back-up against deleterious human mutations. Our analysis demonstrates that the functional compensation by close homologs may play an important role in human genetic disease. Genes with a 90% sequence identity homolog are about 3 times less likely to harbor known disease mutations compared to genes with remote homologs. Moreover, close duplicates affect the phenotypic consequences of deleterious mutations by making a decrease in life expectancy significantly less likely. We also demonstrate that similarity of expression profiles across tissues significantly increases the likelihood of functional compensation by homologs