Search CORE

1,108 research outputs found

Predicting protein function by machine learning on amino acid sequences – a critical evaluation

Author: Al-Shahib A
Breitling R
Gilbert D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

University of Groningen

University of Birmingham Research Portal

Directory of Open Access Journals

Enlighten

The University of Manchester - Institutional Repository

Brunel University Research Archive

Crossref

Proceedings - University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

University of Groningen Digital Archive

Dissertations of the University of Groningen

Recommended from our members

An introduction to Biomodel engineering, illustrated for signal transduction pathways

Author: Breitling R
Donaldson R
Gilbert D
Heiner M
Publication venue: WMC
Publication date: 01/01/2009
Field of study

BioModel Engineering is the science of designing, constructing and analyzing computational models of biological systems. It is inspired by concepts from software engineering and computing science. This paper illustrates a major theme in BioModel Engineering, namely that identifying a quantitative model of a dynamic system means building the structure, finding an initial state, and parameter fitting. In our approach, the structure is obtained by piecewise construction of models from modular parts, the initial state is obtained by analysis of the structure and parameter fitting comprises determining the rate parameters of the kinetic equations. We illustrate this with an example in the area of intracellular signalling pathways

Brunel University Research Archive

GeneRank: Using search engine technology for the analysis of microarray experiments

Author: Breitling R
Gilbert D
Higham D
Morrison JL
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Copyright @ 2005 Morrison et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method – based on the PageRank algorithm employed by the popular search engine Google – that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information. Results: GeneRank is an intuitive modification of PageRank that maintains many of its mathematical properties. It combines gene expression information with a network structure derived from gene annotations (gene ontologies) or expression profile correlations. Using both simulated and real data we find that the algorithm offers an improved ranking of genes compared to pure expression change rankings. Conclusion: Our modification of the PageRank algorithm provides an alternative method of evaluating microarray experimental results which combines prior knowledge about the underlying network. GeneRank offers an improvement compared to assessing the importance of a gene based on its experimentally observed fold-change alone and may be used as a basis for further analytical developments

University of Strathclyde Institutional Repository

University of Groningen

Directory of Open Access Journals

Edinburgh Research Explorer

Enlighten

The University of Manchester - Institutional Repository

Brunel University Research Archive

Proceedings - University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

Dissertations of the University of Groningen

The latent process decomposition of cDNA microarray data sets

Author: Breitling R
Campbell C
Girolami M
Rogers S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called latent process decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in contrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast

University of Groningen

Enlighten

The University of Manchester - Institutional Repository

CUED - Cambridge University Engineering Department

CiteSeerX

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

UCL Discovery

Explore Bristol Research

Dissertations of the University of Groningen

A structured approach for the engineering of biochemical network models, illustrated for signalling pathways

Author: Alves
Breitling
Brown
D. Gilbert
Fisher
Huang
Hucka
Kholodenko
Kholodenko
Kolch
Levchenko
M. Heiner
Mendes
Orton
R. Breitling
R. Orton
Schoeberl
Wiley
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

http://dx.doi.org/10.1093/bib/bbn026Quantitative models of biochemical networks (signal transduction cascades, metabolic pathways, gene regulatory circuits) are a central component of modern systems biology. Building and managing these complex models is a major challenge that can benefit from the application of formal methods adopted from theoretical computing science. Here we provide a general introduction to the field of formal modelling, which emphasizes the intuitive biochemical basis of the modelling process, but is also accessible for an audience with a background in computing science and/or model engineering. We show how signal transduction cascades can be modelled in a modular fashion, using both a qualitative approach { Qualitative Petri nets, and quantitative approaches { Continuous Petri Nets and Ordinary Differential Equations. We review the major elementary building blocks of a cellular signalling model, discuss which critical design decisions have to be made during model building, and present ..

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Enlighten

The University of Manchester - Institutional Repository

University of Groningen Digital Archive

Brunel University Research Archive

Dissertations of the University of Groningen

Probabilistic assignment of formulas to mass peaks in metabolomics experiments

Author: Breitling R
Girolami M
Rogers S
Scheltema RA
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2009
Field of study

Motivation: High-accuracy mass spectrometry is a popular technology for high-throughput measurements of cellular metabolites (metabolomics). One of the major challenges is the correct identification of the observed mass peaks, including the assignment of their empirical formula, based on the measured mass. Results: We propose a novel probabilistic method for the assignment of empirical formulas to mass peaks in high-throughput metabolomics mass spectrometry measurements. The method incorporates information about possible biochemical transformations between the empirical formulas to assign higher probability to formulas that could be created from other metabolites in the sample. In a series of experiments, we show that the method performs well and provides greater insight than assignments based on mass alone. In addition, we extend the model to incorporate isotope information to achieve even more reliable formula identification.</p&gt

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

UCL Discovery

Enlighten

The University of Manchester - Institutional Repository

CUED - Cambridge University Engineering Department

Dissertations of the University of Groningen

Identification of modifiers of alpha-synuclein inclusion in a C. elegans model by genome-wide RNAi

Author: Breitling R.
Ham T. van
Hofstra R.
Nollen E.
Plasterk R.
Thijssen K.
Publication venue
Publication date
Field of study

The silicon trypanosome

Author: BARBARA M. BAKKER
CHRISTINE CLAYTON
Duffieux
HANS V. WESTERHOFF
Hornberg
KEITH MATTHEWS
MARK GIROLAMI
MICHAEL P. BARRETT
PAUL A. M. MICHELS
R. LUISE KRAUTH-SIEGEL
RAINER BREITLING
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2010
Field of study

African trypanosomes have emerged as promising unicellular model organisms for the next generation of systems biology. They offer unique advantages, due to their relative simplicity, the availability of all standard genomics techniques and a long history of quantitative research. Reproducible cultivation methods exist for morphologically and physiologically distinct life-cycle stages. The genome has been sequenced, and microarrays, RNA-interference and high-accuracy metabolomics are available. Furthermore, the availability of extensive kinetic data on all glycolytic enzymes has led to the early development of a complete, experiment-based dynamic model of an important biochemical pathway. Here we describe the achievements of trypanosome systems biology so far and outline the necessary steps towards the ambitious aim of creating a , a comprehensive, experiment-based, multi-scale mathematical model of trypanosome physiology. We expect that, in the long run, the quantitative modelling enabled by the Silicon Trypanosome will play a key role in selecting the most suitable targets for developing new anti-parasite drugs

VU Research Portal

University of Groningen

Edinburgh Research Explorer

Enlighten

The University of Manchester - Institutional Repository

CUED - Cambridge University Engineering Department

CiteSeerX

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

UCL Discovery

DIAL UCLouvain

Dissertations of the University of Groningen

Metabolomics to unveil and understand phenotypic diversity between pathogen populations

Visceral leishmaniasis is caused by a parasite called Leishmania donovani, which every year infects about half a million people and claims several thousand lives. Existing treatments are now becoming less effective due to the emergence of drug resistance. Improving our understanding of the mechanisms used by the parasite to adapt to drugs and achieve resistance is crucial for developing future treatment strategies. Unfortunately, the biological mechanism whereby Leishmania acquires drug resistance is poorly understood. Recent years have brought new technologies with the potential to increase greatly our understanding of drug resistance mechanisms. The latest mass spectrometry techniques allow the metabolome of parasites to be studied rapidly and in great detail. We have applied this approach to determine the metabolome of drug-sensitive and drug-resistant parasites isolated from patients with leishmaniasis. The data show that there are wholesale differences between the isolates and that the membrane composition has been drastically modified in drug-resistant parasites compared with drug-sensitive parasites. Our findings demonstrate that untargeted metabolomics has great potential to identify major metabolic differences between closely related parasite strains and thus should find many applications in distinguishing parasite phenotypes of clinical relevance

University of Groningen

University of Strathclyde Institutional Repository

Directory of Open Access Journals

Enlighten

The University of Manchester - Institutional Repository

Public Library of Science (PLOS)

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

PubMed Central

Institutional Repository Universiteit Antwerpen

University of Groningen Digital Archive

Dissertations of the University of Groningen

Liver Enzymes: Interaction Analysis of Smoking with Alcohol Consumption or BMI, Comparing AST and ALT to γ-GT

Author: AM Strasak
AM Strasak
C Meisinger
C Meisinger
CE Ruhl
CE Ruhl
Christoph Drath
D Robinson
DH Lee
E Fabbrini
EG Giannini
FH Steffensen
H Brenner
Hermann Brenner
JB Whitfield
K Tanaka
L Thomas
LA Adams
LP Breitling
LP Breitling
LP Breitling
LP Breitling
Lutz P. Breitling
M Nishimura
Man-Fung Yuen
PI Alatalo
R Teschke
R Teschke
S Furukawa
SG Wannamethee
SG Wannamethee
TP Whitehead
V Arndt
Volker Arndt
WP Koh
Publication venue: Public Library of Science
Publication date: 22/11/2011
Field of study

A detrimental interaction between smoking and alcohol consumption with respect serum γ-glutamyltransferase (γ-GT) has recently been described. The underlying mechanisms remain unknown. The present work aimed to provide further insights by examining similar interactions pertaining to aspartate and alanine transaminase (AST, ALT), routine liver markers less prone to enzyme induction.<0.0001). The interactions all were in the same directions as for γ-GT, i.e. synergistic with alcohol and opposite with BMI.The patterns of interaction between smoking and alcohol consumption or BMI with respect to AST and ALT resembled those observed for γ-GT. This renders enzyme induction a less probable mechanism for these associations, whereas it might implicate exacerbated hepatocellular vulnerability and injury

Public Library of Science (PLOS)

Crossref

PubMed Central