3,917 research outputs found
Recommended from our members
Characterisation of FAD-family folds using a machine learning approach
Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in
biological processes. They are major organic cofactors and electron carriers
in both enzymatic activities and biochemical pathways. We have analysed
the relationships between sequence and structure of FAD-containing proteins
using a machine learning approach. Decision trees were generated using the
C4.5 algorithm as a means of automatically generating rules from biological
databases (TOPS, CATH and PDB). These rules were then used as
background knowledge for an ILP system to characterise the four different
classes of FAD-family folds classified in Dym and Eisenberg (2001). These
FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR),
p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily
was characterised by a set of rules. The “knowledge patterns”
generated from this approach are a set of rules containing conserved sequence
motifs, secondary structure sequence elements and folding information.
Every rule was then verified using statistical evaluation on the measured
significance of each rule. We show that this machine learning approach is
capable of learning and discovering interesting patterns from large biological
databases and can generate “knowledge patterns” that characterise the FADcontaining
proteins, and at the same time classify these proteins into four
different families
Inferring the function of genes from synthetic lethal mutations
Techniques for detecting synthetic lethal mutations in double gene deletion experiments are emerging as powerful tool for analysing genes in parallel or overlapping pathways with a shared function. This paper introduces a logic-based approach that uses synthetic lethal mutations for mapping genes of unknown function to enzymes in a known metabolic network. We show how such mappings can be automatically computed by a logical learning system called eXtended Hybrid Abductive Inductive Learning (XHAIL)
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
Qualitative System Identification from Imperfect Data
Experience in the physical sciences suggests that the only realistic means of
understanding complex systems is through the use of mathematical models.
Typically, this has come to mean the identification of quantitative models
expressed as differential equations. Quantitative modelling works best when the
structure of the model (i.e., the form of the equations) is known; and the
primary concern is one of estimating the values of the parameters in the model.
For complex biological systems, the model-structure is rarely known and the
modeler has to deal with both model-identification and parameter-estimation. In
this paper we are concerned with providing automated assistance to the first of
these problems. Specifically, we examine the identification by machine of the
structural relationships between experimentally observed variables. These
relationship will be expressed in the form of qualitative abstractions of a
quantitative model. Such qualitative models may not only provide clues to the
precise quantitative model, but also assist in understanding the essence of
that model. Our position in this paper is that background knowledge
incorporating system modelling principles can be used to constrain effectively
the set of good qualitative models. Utilising the model-identification
framework provided by Inductive Logic Programming (ILP) we present empirical
support for this position using a series of increasingly complex artificial
datasets. The results are obtained with qualitative and quantitative data
subject to varying amounts of noise and different degrees of sparsity. The
results also point to the presence of a set of qualitative states, which we
term kernel subsets, that may be necessary for a qualitative model-learner to
learn correct models. We demonstrate scalability of the method to biological
system modelling by identification of the glycolysis metabolic pathway from
data
- …