Search CORE

2,628 research outputs found

Predicting protein function by machine learning on amino acid sequences – a critical evaluation

Author: Al-Shahib A
Breitling R
Gilbert D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

University of Groningen

University of Birmingham Research Portal

Directory of Open Access Journals

Enlighten

The University of Manchester - Institutional Repository

Brunel University Research Archive

Crossref

Proceedings - University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

University of Groningen Digital Archive

Dissertations of the University of Groningen

What is Systems Biology?

Author: Rainer Breitling
Rainer Breitling
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2010
Field of study

Systems biology is increasingly popular, but to many biologists it remains unclear what this new discipline actually encompasses. This brief personal perspective starts by outlining the asthetic qualities that motivate systems biologists, discusses which activities do not belong to the core of systems biology, and finally explores the crucial link with synthetic biology. It concludes by attempting to define systems biology as the research endeavor that aims at providing the scientific foundation for successful synthetic biology

Crossref

Directory of Open Access Journals

PubMed Central

Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets

Author: Breitling Rainer
Daly Ronan
Rogers Simon
Wandy Joe
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/02/2015
Field of study

Motivation: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that co-elute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pairwise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result. Results: We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools. Availability: The proposed alignment method has been implemented as a stand-alone application in Python, available for download at http://github.com/joewandy/peak-grouping-alignment.</p&gt

PubMed Central

Enlighten

The University of Manchester - Institutional Repository

GeneRank: Using search engine technology for the analysis of microarray experiments

Author: Breitling R
Gilbert D
Higham D
Morrison JL
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Copyright @ 2005 Morrison et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method – based on the PageRank algorithm employed by the popular search engine Google – that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information. Results: GeneRank is an intuitive modification of PageRank that maintains many of its mathematical properties. It combines gene expression information with a network structure derived from gene annotations (gene ontologies) or expression profile correlations. Using both simulated and real data we find that the algorithm offers an improved ranking of genes compared to pure expression change rankings. Conclusion: Our modification of the PageRank algorithm provides an alternative method of evaluating microarray experimental results which combines prior knowledge about the underlying network. GeneRank offers an improvement compared to assessing the importance of a gene based on its experimentally observed fold-change alone and may be used as a basis for further analytical developments

University of Strathclyde Institutional Repository

University of Groningen

Directory of Open Access Journals

Edinburgh Research Explorer

Enlighten

The University of Manchester - Institutional Repository

Brunel University Research Archive

Proceedings - University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

Dissertations of the University of Groningen

Recommended from our members

An introduction to Biomodel engineering, illustrated for signal transduction pathways

Author: Breitling R
Donaldson R
Gilbert D
Heiner M
Publication venue: WMC
Publication date: 01/01/2009
Field of study

BioModel Engineering is the science of designing, constructing and analyzing computational models of biological systems. It is inspired by concepts from software engineering and computing science. This paper illustrates a major theme in BioModel Engineering, namely that identifying a quantitative model of a dynamic system means building the structure, finding an initial state, and parameter fitting. In our approach, the structure is obtained by piecewise construction of models from modular parts, the initial state is obtained by analysis of the structure and parameter fitting comprises determining the rate parameters of the kinetic equations. We illustrate this with an example in the area of intracellular signalling pathways

Brunel University Research Archive

The latent process decomposition of cDNA microarray data sets

Author: Breitling R
Campbell C
Girolami M
Rogers S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called latent process decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in contrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast

University of Groningen

Enlighten

The University of Manchester - Institutional Repository

CUED - Cambridge University Engineering Department

CiteSeerX

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

UCL Discovery

Explore Bristol Research

Dissertations of the University of Groningen

La puesta en escena de lo trágico. La hermenéutica de lo trágico según Paul Ricoeur

Author: Breitling Andris
Publication venue: 'Universidade da Coruna'
Publication date: 01/01/2003
Field of study

Repositorio da Universidade da Coruña

A structured approach for the engineering of biochemical network models, illustrated for signalling pathways

Author: Alves
Breitling
Brown
D. Gilbert
Fisher
Huang
Hucka
Kholodenko
Kholodenko
Kolch
Levchenko
M. Heiner
Mendes
Orton
R. Breitling
R. Orton
Schoeberl
Wiley
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

http://dx.doi.org/10.1093/bib/bbn026Quantitative models of biochemical networks (signal transduction cascades, metabolic pathways, gene regulatory circuits) are a central component of modern systems biology. Building and managing these complex models is a major challenge that can benefit from the application of formal methods adopted from theoretical computing science. Here we provide a general introduction to the field of formal modelling, which emphasizes the intuitive biochemical basis of the modelling process, but is also accessible for an audience with a background in computing science and/or model engineering. We show how signal transduction cascades can be modelled in a modular fashion, using both a qualitative approach { Qualitative Petri nets, and quantitative approaches { Continuous Petri Nets and Ordinary Differential Equations. We review the major elementary building blocks of a cellular signalling model, discuss which critical design decisions have to be made during model building, and present ..

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Enlighten

The University of Manchester - Institutional Repository

University of Groningen Digital Archive

Brunel University Research Archive

Dissertations of the University of Groningen