24 research outputs found

    Active machine learning-driven experimentation to determine compound effects on protein patterns

    Get PDF
    Abstract High throughput screening determines the effects of many conditions on a given biological target. Currently, to estimate the effects of those conditions on other targets requires either strong modeling assumptions (e.g. similarities among targets) or separate screens. Ideally, data-driven experimentation could be used to learn accurate models for many conditions and targets without doing all possible experiments. We have previously described an active machine learning algorithm that can iteratively choose small sets of experiments to learn models of multiple effects. We now show that, with no prior knowledge and with liquid handling robotics and automated microscopy under its control, this learner accurately learned the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments. The results represent the first practical demonstration of the utility of active learningdriven biological experimentation in which the set of possible phenotypes is unknown in advance

    Deciding when to stop: efficient experimentation to learn to predict drug-target interactions.

    No full text
    <p>BACKGROUND: Active learning is a powerful tool for guiding an experimentation process. Instead of doing all possible experiments in a given domain, active learning can be used to pick the experiments that will add the most knowledge to the current model. Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions. However, in practice, it is crucial to have a method to evaluate the quality of the current predictions and decide when to stop the experimentation process. Only by applying reliable stopping criteria to active learning can time and costs in the experimental process actually be saved.</p> <p>RESULTS: We compute active learning traces on simulated drug-target matrices in order to determine a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.</p> <p>CONCLUSIONS: We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.</p

    Efficient modeling and active learning discovery of biological responses.

    Get PDF
    High throughput and high content screening involve determination of the effect of many compounds on a given target. As currently practiced, screening for each new target typically makes little use of information from screens of prior targets. Further, choices of compounds to advance to drug development are made without significant screening against off-target effects. The overall drug development process could be made more effective, as well as less expensive and time consuming, if potential effects of all compounds on all possible targets could be considered, yet the cost of such full experimentation would be prohibitive. In this paper, we describe a potential solution: probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for efficiently selecting which experiments to perform in order to build those models and determining when to stop. Using simulated and experimental data, we show that our approaches can produce powerful predictive models without exhaustive experimentation and can learn them much faster than by selecting experiments at random

    Active machine learning-driven experimentation to determine compound effects on protein patterns

    No full text
    Photograph of the front exterior (east side) of the Nobel County Courthouse

    Efficient Modeling and Active Learning Discovery of Biological Responses

    No full text
    <p>High throughput and high content screening involve determination of the effect of many compounds on a given target. As currently practiced, screening for each new target typically makes little use of information from screens of prior targets. Further, choices of compounds to advance to drug development are made without significant screening against off-target effects. The overall drug development process could be made more effective, as well as less expensive and time consuming, if potential effects of all compounds on all possible targets could be considered, yet the cost of such full experimentation would be prohibitive. In this paper, we describe a potential solution: probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for efficiently selecting which experiments to perform in order to build those models and determining when to stop. Using simulated and experimental data, we show that our approaches can produce powerful predictive models without exhaustive experimentation and can learn them much faster than by selecting experiments at random.</p

    Active learning performance for different model designs.

    No full text
    <p>Performance was measured as the difference in the number of batches to achieve (A,B) 100% or (C,D) 90% accuracy between active and random learning. (A,C) Greedy Merge, (B,D) B-Clustering. Warmer colors indicate greater experiment savings with an active learner.</p

    Learning performance dependence on model design: structure learning and imputation rule choice.

    No full text
    <p>(A) Each model design was evaluated with both active and random learners on two simulated 100 target x 100 condition datasets, each having eight phenotypes, 80% responsiveness and 40% uniqueness. For each model design the best average accuracy for either the active or random learner is plotted at each batch. For six cases displaying superlinear performance, structure learning methods are indicated in color, with different design variations plotted as separate lines and with filled circles to indicate batches where the active learner had higher accuracy: Greedy Merge (blue), a ‘strict’ variation of Greedy Merge (red), and B-Clustering (green, one design). These each had both Target Equivalence Class and Three-Point Imputation rules. (B) The difference in random and active learner accuracies for the superlinear model designs with structure learning method plotted by color as above; filled circles at tails indicate that the active learner had reached 100% accuracy.</p

    Probability of approximate correctness over a broad range of data.

    No full text
    <p>(A) The empirical density of the correspondence between the predicted accuracy score and the true (latent) accuracy; lighter colors indicate greater frequency. (B) Confidence (in units of probability) per level set of predicted accuracy score. (C) Per-batch and (1% binned) predicted accuracy score confidences; color indicates confidence (in units of probability).</p

    Example learning curves.

    No full text
    <p>Mean learning rates for active (solid) and random (dashed) learners across structure learning methods, Full Greedy Merge (blue) and B-Clustering (green). Data from experiments in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0083996#pone-0083996-g003" target="_blank">Figure 3</a> for (A) (λ<sub>r</sub>=90%, λ<sub>u</sub>=25%); (B) (λ<sub>r</sub>=10%, λ<sub>u</sub>=70%).</p

    Active Learning Process.

    No full text
    <p>(A) An experiment is a combination of a target and a condition; observed experiments (filled circles) associate a target and condition with a vector encoding an experiment result. (B) Phenotypes (filled colored circles) are identified by cluster analysis of the experiment results. (C) From the arrangement of phenotypes across targets and conditions, a small set of correlations Ï• (distributions of phenotypes across targets) are identified which are then used to impute unobserved experiments. (D) A batch of experiments (filled grey circles) is selected based in part on predictions (outlined colored circles) from the identified correlations. The process (B-D) is repeated until a desired goal is met.</p
    corecore