2,220 research outputs found
Qualitative System Identification from Imperfect Data
Experience in the physical sciences suggests that the only realistic means of
understanding complex systems is through the use of mathematical models.
Typically, this has come to mean the identification of quantitative models
expressed as differential equations. Quantitative modelling works best when the
structure of the model (i.e., the form of the equations) is known; and the
primary concern is one of estimating the values of the parameters in the model.
For complex biological systems, the model-structure is rarely known and the
modeler has to deal with both model-identification and parameter-estimation. In
this paper we are concerned with providing automated assistance to the first of
these problems. Specifically, we examine the identification by machine of the
structural relationships between experimentally observed variables. These
relationship will be expressed in the form of qualitative abstractions of a
quantitative model. Such qualitative models may not only provide clues to the
precise quantitative model, but also assist in understanding the essence of
that model. Our position in this paper is that background knowledge
incorporating system modelling principles can be used to constrain effectively
the set of good qualitative models. Utilising the model-identification
framework provided by Inductive Logic Programming (ILP) we present empirical
support for this position using a series of increasingly complex artificial
datasets. The results are obtained with qualitative and quantitative data
subject to varying amounts of noise and different degrees of sparsity. The
results also point to the presence of a set of qualitative states, which we
term kernel subsets, that may be necessary for a qualitative model-learner to
learn correct models. We demonstrate scalability of the method to biological
system modelling by identification of the glycolysis metabolic pathway from
data
Using a logical model to predict the growth of yeast
<p>Abstract</p> <p>Background</p> <p>A logical model of the known metabolic processes in <it>S. cerevisiae </it>was constructed from iFF708, an existing Flux Balance Analysis (FBA) model, and augmented with information from the KEGG online pathway database. The use of predicate logic as the knowledge representation for modelling enables an explicit representation of the structure of the metabolic network, and enables logical inference techniques to be used for model identification/improvement.</p> <p>Results</p> <p>Compared to the FBA model, the logical model has information on an additional 263 putative genes and 247 additional reactions. The correctness of this model was evaluated by comparison with iND750 (an updated FBA model closely related to iFF708) by evaluating the performance of both models on predicting empirical minimal medium growth data/essential gene listings.</p> <p>Conclusion</p> <p>ROC analysis and other statistical studies revealed that use of the simpler logical form and larger coverage results in no significant degradation of performance compared to iND750.</p
Combining inductive logic programming, active learning and robotics to discover the function of genes
The paper is addressed to AI workers with an interest in biomolecular genetics and also to biomolecular geneticists interested in what AI tools may do for them. The authors are engaged in a collaborative enterprise aimed at partially automating some aspects of scientific work. These aspects include the processes of forming hypotheses, devising trials to discriminate between these competing hypotheses, physically performing these trials and then using the results of these trials to converge upon an accurate hypothesis. As a potential component of the reasoning carried out by an "artificial scientist" this paper describes ASE-Progol, an Active Learning system which uses Inductive Logic Programming to construct hypothesised first-order theories and uses a CART-like algorithm to select trials for eliminating ILP derived hypotheses. In simulated yeast growth tests ASE-Progol was used to rediscover how genes participate in the aromatic amino acid pathway of Saccharomyces cerevisiae. The cost of the chemicals consumed in converging upon a hypothesis with an accuracy of around 88% was reduced by five orders of magnitude when trials were selected by ASE-Progol rather than being sampled at random. While the naive strategy of always choosing the cheapest trial from the set of candidate trials led to lower cumulative costs than ASE-Progol, both the naive strategy and the random strategy took significantly longer to converge upon a final hypothesis than ASE-Progol. For example to reach an accuracy of 80%, ASE-Progol required 4 days while random sampling required 6 days and the naive strategy required 10 days
Logic Programs as Declarative and Procedural Bias in Inductive Logic Programming
Machine Learning is necessary for the development of Artificial Intelligence, as pointed out by Turing in his 1950 article ``Computing Machinery and Intelligence''. It is in the same article that Turing suggested the use of computational logic and background knowledge for learning. This thesis follows a logic-based machine learning approach called Inductive Logic Programming (ILP), which is advantageous over other machine learning approaches in terms of relational learning and utilising background knowledge. ILP uses logic programs as a uniform representation for hypothesis, background knowledge and examples, but its declarative bias is usually encoded using metalogical statements. This thesis advocates the use of logic programs to represent declarative and procedural bias, which results in a framework of single-language representation.
We show in this thesis that using a logic program called the top theory as declarative bias leads to a sound and complete multi-clause learning system MC-TopLog. It overcomes the entailment-incompleteness of Progol, thus outperforms Progol in terms of predictive accuracies on learning grammars and strategies for playing Nim game. MC-TopLog has been applied to two real-world applications funded by Syngenta, which is an agriculture company.
A higher-order extension on top theories results in meta-interpreters, which allow the introduction of new predicate symbols. Thus the resulting ILP system Metagol can do predicate invention, which is an intrinsically higher-order logic operation. Metagol also leverages the procedural semantic of Prolog to encode procedural bias, so that it can outperform both its ASP version and ILP systems without an equivalent procedural bias in terms of efficiency and accuracy. This is demonstrated by the experiments on learning Regular, Context-free and Natural grammars. Metagol is also applied to non-grammar learning tasks involving recursion and predicate invention, such as learning a definition of staircases and robot strategy learning. Both MC-TopLog and Metagol are based on a -directed framework, which is different from other multi-clause learning systems based on Inverse Entailment, such as CF-Induction, XHAIL and IMPARO. Compared to another -directed multi-clause learning system TAL, Metagol allows the explicit form of higher-order assumption to be encoded in the form of meta-rules.Open Acces
Studying the Functional Genomics of Stress Responses in Loblolly Pine With the Expresso Microarray Experiment Management System
Conception, design, and implementation of cDNA microarray experiments present a
variety of bioinformatics challenges for biologists and computational scientists. The multiple
stages of data acquisition and analysis have motivated the design of Expresso, a
system for microarray experiment management. Salient aspects of Expresso include
support for clone replication and randomized placement; automatic gridding, extraction of
expression data from each spot, and quality monitoring; flexible methods of combining
data from individual spots into information about clones and functional categories; and the
use of inductive logic programming for higher-level data analysis and mining. The
development of Expresso is occurring in parallel with several generations of microarray
experiments aimed at elucidating genomic responses to drought stress in loblolly pine
seedlings. The current experimental design incorporates 384 pine cDNAs replicated and
randomly placed in two specific microarray layouts. We describe the design of Expresso as
well as results of analysis with Expresso that suggest the importance of molecular
chaperones and membrane transport proteins in mechanisms conferring successful
adaptation to long-term drought stress
Model Revision from Temporal Logic Properties in Computational Systems Biology
International audienceSystems biologists build models of bio-molecular processes from knowledge acquired both at the gene and protein levels, and at the phenotype level through experiments done in wildlife and mutated organisms. In this chapter, we present qualitative and quantitative logic learning tools, and illustrate how they can be useful to the modeler. We focus on biochemical reaction models written in the Systems Biology Markup Language SBML, and interpreted in the Biochemical Abstract Machine BIOCHAM. We first present a model revision algorithm for inferring reaction rules from biological properties expressed in temporal logic. Then we discuss the representations of kinetic models with ordinary differential equations (ODEs) and with stochastic logic programs (SLPs), and describe a parameter search algorithm for finding parameter values satisfying quantitative temporal properties. These methods are illustrated by a simple model of the cell cycle control, and by an application to the modelling of the conditions of synchronization in period of the cell cycle by the circadian cycle
- ā¦