1,155 research outputs found

    Predicting protein function by machine learning on amino acid sequences – a critical evaluation

    Get PDF
    Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

    Metabolic modeling and analysis of the metabolic switch in Streptomyces coelicolor

    Get PDF
    Background The transition from exponential to stationary phase in Streptomyces coelicolor is accompanied by a major metabolic switch and results in a strong activation of secondary metabolism. Here we have explored the underlying reorganization of the metabolome by combining computational predictions based on constraint-based modeling and detailed transcriptomics time course observations. Results We reconstructed the stoichiometric matrix of S. coelicolor, including the major antibiotic biosynthesis pathways, and performed flux balance analysis to predict flux changes that occur when the cell switches from biomass to antibiotic production. We defined the model input based on observed fermenter culture data and used a dynamically varying objective function to represent the metabolic switch. The predicted fluxes of many genes show highly significant correlation to the time series of the corresponding gene expression data. Individual mispredictions identify novel links between antibiotic production and primary metabolism. Conclusion Our results show the usefulness of constraint-based modeling for providing a detailed interpretation of time course gene expression data

    The silicon trypanosome

    Get PDF
    African trypanosomes have emerged as promising unicellular model organisms for the next generation of systems biology. They offer unique advantages, due to their relative simplicity, the availability of all standard genomics techniques and a long history of quantitative research. Reproducible cultivation methods exist for morphologically and physiologically distinct life-cycle stages. The genome has been sequenced, and microarrays, RNA-interference and high-accuracy metabolomics are available. Furthermore, the availability of extensive kinetic data on all glycolytic enzymes has led to the early development of a complete, experiment-based dynamic model of an important biochemical pathway. Here we describe the achievements of trypanosome systems biology so far and outline the necessary steps towards the ambitious aim of creating a , a comprehensive, experiment-based, multi-scale mathematical model of trypanosome physiology. We expect that, in the long run, the quantitative modelling enabled by the Silicon Trypanosome will play a key role in selecting the most suitable targets for developing new anti-parasite drugs

    Metabolomics to unveil and understand phenotypic diversity between pathogen populations

    Get PDF
    Visceral leishmaniasis is caused by a parasite called Leishmania donovani, which every year infects about half a million people and claims several thousand lives. Existing treatments are now becoming less effective due to the emergence of drug resistance. Improving our understanding of the mechanisms used by the parasite to adapt to drugs and achieve resistance is crucial for developing future treatment strategies. Unfortunately, the biological mechanism whereby Leishmania acquires drug resistance is poorly understood. Recent years have brought new technologies with the potential to increase greatly our understanding of drug resistance mechanisms. The latest mass spectrometry techniques allow the metabolome of parasites to be studied rapidly and in great detail. We have applied this approach to determine the metabolome of drug-sensitive and drug-resistant parasites isolated from patients with leishmaniasis. The data show that there are wholesale differences between the isolates and that the membrane composition has been drastically modified in drug-resistant parasites compared with drug-sensitive parasites. Our findings demonstrate that untargeted metabolomics has great potential to identify major metabolic differences between closely related parasite strains and thus should find many applications in distinguishing parasite phenotypes of clinical relevance

    Stability and aggregation of ranked gene lists

    Get PDF
    Ranked gene lists are highly instable in the sense that similar measures of differential gene expression may yield very different rankings, and that a small change of the data set usually affects the obtained gene list considerably. Stability issues have long been under-considered in the literature, but they have grown to a hot topic in the last few years, perhaps as a consequence of the increasing skepticism on the reproducibility and clinical applicability of molecular research findings. In this article, we review existing approaches for the assessment of stability of ranked gene lists and the related problem of aggregation, give some practical recommendations, and warn against potential misuse of these methods. This overview is illustrated through an application to a recent leukemia data set using the freely available Bioconductor package GeneSelector

    designGG:an R-package and web tool for the optimal design of genetical genomics experiments

    Get PDF
    BACKGROUND: High-dimensional biomolecular profiling of genetically different individuals in one or more environmental conditions is an increasingly popular strategy for exploring the functioning of complex biological systems. The optimal design of such genetical genomics experiments in a cost-efficient and effective way is not trivial. RESULTS: This paper presents designGG, an R package for designing optimal genetical genomics experiments. A web implementation for designGG is available at http://gbic.biol.rug.nl/designGG. All software, including source code and documentation, is freely available. CONCLUSION: DesignGG allows users to intelligently select and allocate individuals to experimental units and conditions such as drug treatment. The user can maximize the power and resolution of detecting genetic, environmental and interaction effects in a genome-wide or local mode by giving more weight to genome regions of special interest, such as previously detected phenotypic quantitative trait loci. This will help to achieve high power and more accurate estimates of the effects of interesting factors, and thus yield a more reliable biological interpretation of data. DesignGG is applicable to linkage analysis of experimental crosses, e.g. recombinant inbred lines, as well as to association analysis of natural populations

    Expression quantitative trait loci are highly sensitive to cellular differentiation state

    Get PDF
    Blood cell development from multipotent hematopoietic stem cells to specialized blood cells is accompanied by drastic changes in gene expression for which the triggers remain mostly unknown. Genetical genomics is an approach linking natural genetic variation to gene expression variation, thereby allowing the identification of genomic loci containing gene expression modulators (eQTLs). In this paper, we used a genetical genomics approach to analyze gene expression across four developmentally close blood cell types collected from a large number of genetically different but related mouse strains. We found that, while a significant number of eQTLs (365) had a consistent “static” regulatory effect on gene expression, an even larger number were found to be very sensitive to cell stage. As many as 1,283 eQTLs exhibited a “dynamic” behavior across cell types. By looking more closely at these dynamic eQTLs, we show that the sensitivity of eQTLs to cell stage is largely associated with gene expression changes in target genes. These results stress the importance of studying gene expression variation in well-defined cell populations. Only such studies will be able to reveal the important differences in gene regulation between different ce

    MultiMetEval: comparative and multi-objective analysis of genome-scale metabolic models

    Get PDF
    Comparative metabolic modelling is emerging as a novel field, supported by the development of reliable and standardized approaches for constructing genome-scale metabolic models in high throughput. New software solutions are needed to allow efficient comparative analysis of multiple models in the context of multiple cellular objectives. Here, we present the user-friendly software framework Multi-Metabolic Evaluator (MultiMetEval), built upon SurreyFBA, which allows the user to compose collections of metabolic models that together can be subjected to flux balance analysis. Additionally, MultiMetEval implements functionalities for multi-objective analysis by calculating the Pareto front between two cellular objectives. Using a previously generated dataset of 38 actinobacterial genome-scale metabolic models, we show how these approaches can lead to exciting novel insights. Firstly, after incorporating several pathways for the biosynthesis of natural products into each of these models, comparative flux balance analysis predicted that species like Streptomyces that harbour the highest diversity of secondary metabolite biosynthetic gene clusters in their genomes do not necessarily have the metabolic network topology most suitable for compound overproduction. Secondly, multi-objective analysis of biomass production and natural product biosynthesis in these actinobacteria shows that the well-studied occurrence of discrete metabolic switches during the change of cellular objectives is inherent to their metabolic network architecture. Comparative and multi-objective modelling can lead to insights that could not be obtained by normal flux balance analyses. MultiMetEval provides a powerful platform that makes these analyses straightforward for biologists. Sources and binaries of MultiMetEval are freely available from https://github.com/PiotrZakrzewski/MetEv​al/downloads
    corecore