The paper is addressed to AI workers with an interest in biomolecular genetics and also to biomolecular geneticists interested in what AI tools may do for them. The authors are engaged in a collaborative enterprise aimed at partially automating some aspects of scientific work. These aspects include the processes of forming hypotheses, devising trials to discriminate between these competing hypotheses, physically performing these trials and then using the results of these trials to converge upon an accurate hypothesis. As a potential component of the reasoning carried out by an "artificial scientist" this paper describes ASE-Progol, an Active Learning system which uses Inductive Logic Programming to construct hypothesised first-order theories and uses a CART-like algorithm to select trials for eliminating ILP derived hypotheses. In simulated yeast growth tests ASE-Progol was used to rediscover how genes participate in the aromatic amino acid pathway of Saccharomyces cerevisiae. The cost of the chemicals consumed in converging upon a hypothesis with an accuracy of around 88% was reduced by five orders of magnitude when trials were selected by ASE-Progol rather than being sampled at random. While the naive strategy of always choosing the cheapest trial from the set of candidate trials led to lower cumulative costs than ASE-Progol, both the naive strategy and the random strategy took significantly longer to converge upon a final hypothesis than ASE-Progol. For example to reach an accuracy of 80%, ASE-Progol required 4 days while random sampling required 6 days and the naive strategy required 10 days

Bryant, CH

Kell, DB

King, RD

Muggleton, SH

Oliver, SG

Reiser, P

Italian

The University of Manchester - Institutional Repository

Combining active inductive programming, active learning and robotics to discover the function of genes

We aim to partially automate some aspects of scientific work, namely the processes of forming hypotheses, devising trials to discriminate between these competing hypotheses, physically performing these trials and then using the results of these trials to converge upon an accurate hypothesis. We have developed ASE-Progol, an Active Learning system which uses Inductive Logic Programming to construct hypothesised first-order theories and uses a CART-like algorithm to select trials for eliminating ILP derived hypotheses. We have developed a novel form of learning curve, which in contrast to the form of learning curve normally used in Active Learning, allows one to compare the costs incurred by different leaning strategies.We plan to combine ASE-Progol with a standard laboratory robot to create a general automated approach to Functional Genomics. As a first step towards this goal, we are using ASE-Progol to rediscover how genes participate in the aromatic amino acid pathway of Saccharomyces cerevisiae. Our approach involves auxotrophic mutant trials. To date, ASE-Progol has conducted such trials in silico. However we describe how they will be performed automatically in vitro by a standard laboratory robot designed for these sorts of liquid handling tasks, namely the Beckman/Coulter Biomek 2000. Although our work to date has been limited to trials conducted in silico, the results have been encouraging. Parts of the model were removed and the ability of ASE-Progol to efficiently recover the performance of the model was measured. The cost of the chemicals consumed in converging upon a hypothesis with an accuracy in the range 46-88% was reduced if trials were selected by ASE-Progol rather than if they were sampled at random (without replacement). To reach an accuracy in the range 46-80%, ASE-Progol incurs five orders of magnitude less experimental costs than random sampling. ASE-Progol requires less time to converge upon a hypothesis with an accuracy in the range 74-87% than if trials are sampled at random (without replacement) or selected using the naive strategy of always choosing the cheapest trial from the set of candidate trials. For example to reach an accuracy of 80%, ASE-Progol requires 4 days while random sampling requires 6 days and the naive strategy requires 10 days

Bryant, C.H.

Muggleton, S.H.

Oliver, S.G.

Kell, D.B.

Reiser, P.

King, R.D.

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes

University of Salford Institutional Repository

Combining inductive logic programming, active learning and robotics to discover the function of genes

Publikationer från Linköpings universitet

https://salford-repository.worktribe.com/file/1468885/1/cis01012.pdf

Combining inductive logic programming, active learning and robotics to discover the function of genes

Abstract

Similar works

Full text

Available Versions

The University of Manchester - Institutional Repository

Digitala Vetenskapliga Arkivet - Academic Archive On-line

University of Salford Institutional Repository

Publikationer från Linköpings universitet