Predicting Positive p53 Cancer Rescue Regions Using Most Informative
                    Positive (MIP) Active Learning

A Friedler; A Petitjean; A Ventura; AC Joerger; AC Martin; AL Cuff; AN Bullock; AR Fersht; BG Buchanan; CL Brooks; DA Case; DA Cohn; EF Pettersen; F Francois; F Glaser; G Dantas; G. Wesley Hatfield; IH Witten; J Feng; James M. Briggs; JM Lambert; JS Huston; K Otsuka; Kirsty Salmon; L Itti; Linda Hall; Lydia Ho; M Hollstein; M Saar-Tsechansky; MA Hearst; N Roy; NE Sharpless; NG Karaguler; P Baldi; Peter Kaiser; PV Nikolova; R Jones; Richard H. Lathrop; RJ Fox; RK Brachmann; RK Brachmann; Roberta Baronio; S Kato; S Lain; SA Danziger; SA Danziger; Samuel A. Danziger; SM Leach; TE Baroni; VJ Bykov; W Wang; W Xue; Y Cho

Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning

Authors: A Friedler
A Petitjean
A Ventura
AC Joerger
AC Martin
AL Cuff
AN Bullock
AR Fersht
BG Buchanan
CL Brooks
DA Case
DA Cohn
EF Pettersen
F Francois
F Glaser
G Dantas
G. Wesley Hatfield
IH Witten
J Feng
James M. Briggs
JM Lambert
JS Huston
K Otsuka
Kirsty Salmon
L Itti
Linda Hall
Lydia Ho
M Hollstein
M Saar-Tsechansky
MA Hearst
N Roy
NE Sharpless
NG Karaguler
P Baldi
Peter Kaiser
PV Nikolova
R Jones
Richard H. Lathrop
RJ Fox
RK Brachmann
RK Brachmann
Roberta Baronio
S Kato
S Lain
SA Danziger
SA Danziger
Samuel A. Danziger
SM Leach
TE Baroni
VJ Bykov
W Wang
W Xue
Y Cho
Publication date: 1 January 2009
Publisher: Public Library of Science
Doi

Abstract

Many protein engineering problems involve finding mutations that produce proteins with a particular function. Computational active learning is an attractive approach to discover desired biological activities. Traditional active learning techniques have been optimized to iteratively improve classifier accuracy, not to quickly discover biologically significant results. We report here a novel active learning technique, Most Informative Positive (MIP), which is tailored to biological problems because it seeks novel and informative positive results. MIP active learning differs from traditional active learning methods in two ways: (1) it preferentially seeks Positive (functionally active) examples; and (2) it may be effectively extended to select gene regions suitable for high throughput combinatorial mutagenesis. We applied MIP to discover mutations in the tumor suppressor protein p53 that reactivate mutated p53 found in human cancers. This is an important biomedical goal because p53 mutants have been implicated in half of all human cancers, and restoring active p53 in tumors leads to tumor regression. MIP found Positive (cancer rescue) p53 mutants in silico using 33% fewer experiments than traditional non-MIP active learning, with only a minor decrease in classifier accuracy. Applying MIP to in vivo experimentation yielded immediate Positive results. Ten different p53 mutations found in human cancers were paired in silico with all possible single amino acid rescue mutations, from which MIP was used to select a Positive Region predicted to be enriched for p53 cancer rescue mutants. In vivo assays showed that the predicted Positive Region: (1) had significantly more (p<0.01) new strong cancer rescue mutants than control regions (Negative, and non-MIP active learning); (2) had slightly more new strong cancer rescue mutants than an Expert region selected for purely biological considerations; and (3) rescued for the first time the previously unrescuable p53 cancer mutant P152L