2,628 research outputs found

    Predicting protein function by machine learning on amino acid sequences ā€“ a critical evaluation

    Get PDF
    Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

    What is Systems Biology?

    Get PDF
    Systems biology is increasingly popular, but to many biologists it remains unclear what this new discipline actually encompasses. This brief personal perspective starts by outlining the asthetic qualities that motivate systems biologists, discusses which activities do not belong to the core of systems biology, and finally explores the crucial link with synthetic biology. It concludes by attempting to define systems biology as the research endeavor that aims at providing the scientific foundation for successful synthetic biology

    Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets

    Get PDF
    Motivation: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that co-elute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pairwise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result.<p></p> Results: We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools.<p></p> Availability: The proposed alignment method has been implemented as a stand-alone application in Python, available for download at http://github.com/joewandy/peak-grouping-alignment.<p></p&gt

    GeneRank: Using search engine technology for the analysis of microarray experiments

    Get PDF
    Copyright @ 2005 Morrison et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method ā€“ based on the PageRank algorithm employed by the popular search engine Google ā€“ that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information. Results: GeneRank is an intuitive modification of PageRank that maintains many of its mathematical properties. It combines gene expression information with a network structure derived from gene annotations (gene ontologies) or expression profile correlations. Using both simulated and real data we find that the algorithm offers an improved ranking of genes compared to pure expression change rankings. Conclusion: Our modification of the PageRank algorithm provides an alternative method of evaluating microarray experimental results which combines prior knowledge about the underlying network. GeneRank offers an improvement compared to assessing the importance of a gene based on its experimentally observed fold-change alone and may be used as a basis for further analytical developments

    The latent process decomposition of cDNA microarray data sets

    Get PDF
    We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called latent process decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in contrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast

    A structured approach for the engineering of biochemical network models, illustrated for signalling pathways

    Get PDF
    http://dx.doi.org/10.1093/bib/bbn026Quantitative models of biochemical networks (signal transduction cascades, metabolic pathways, gene regulatory circuits) are a central component of modern systems biology. Building and managing these complex models is a major challenge that can benefit from the application of formal methods adopted from theoretical computing science. Here we provide a general introduction to the field of formal modelling, which emphasizes the intuitive biochemical basis of the modelling process, but is also accessible for an audience with a background in computing science and/or model engineering. We show how signal transduction cascades can be modelled in a modular fashion, using both a qualitative approach { Qualitative Petri nets, and quantitative approaches { Continuous Petri Nets and Ordinary Differential Equations. We review the major elementary building blocks of a cellular signalling model, discuss which critical design decisions have to be made during model building, and present ..
    • ā€¦
    corecore