Machine Aided Biological Discovery and Design

Abstract

Advances in biotechnology and the life sciences are primarily driven by biologists conducting rigorous experimentation. However, biology is often too complex – with intractable combinatorial search spaces and functional landscapes – to comprehensively explore, understand, and engineer via iterative biological experimentation. Next-generation sequencing technologies have made it possible to measure biology in high-throughput, giving observational insight into these complexities. Further, in recent years, it has become possible to both manipulate biological systems with fine-grained control and directly synthesize large libraries of DNA molecules with specified sequences, providing unprecedented ability to engineer biology. We explore the thesis that computational methods that are built with experimental considerations and trained on carefully selected high-throughput experimental data can drive advances in the life sciences by making accurate predictions that can then be used to iteratively generate hypotheses and design biological sequences for further experimental validation. To test our thesis about the value of computational methods we introduce and apply computational approaches for modeling cellular differentiation trajectories, identifying non-specific antibodies, and designing diverse libraries of biological sequences that reflect desired objectives. First, we introduce a generative machine learning model for inferring cellular developmental landscapes from cross-sectional sequencing of in vitro differentiation time-series. We validate this model with ground-truth experimental lineage tracing experiments, and we show its ability to conduct in silico simulations of cellular differentiation trajectories with perturbations. Next, we present a computational framework for using sequencing data from therapeutic discovery campaigns to identify nonspecific antibody therapeutics in large candidate pools. We show that this approach bypasses and outperforms costly combinatorial affinity selection experiments and allows the use of only single-target selection data to identify pairwise nonspecificity. Finally, we introduce an algorithm for the rational design of high diversity synthetic antibody libraries using machine learning models and stochastic optimization. We show how this can be used to develop large libraries optimized for targets or developability characteristics leading to more promising candidates from affinity selection.Ph.D

    Similar works

    Full text

    thumbnail-image

    Available Versions