The cells of all living organisms across every domain of life contain a heritable DNA genome that encodes all of the requisite information needed to recapitulate their structure, function, and behavior. The development of programmable tools capable of editing DNA in living cells has enabled a revolution in the biological and biomedical sciences. Despite their enormous potential, the application of these tools to personalized medicine has been challenging due to wide variability in editing efficiency across different sequences. In this thesis, I describe the development of computational tools for predicting and analyzing the outcomes of therapeutic genome editing experiments. I show that these tools enable the rapid development of editing strategies for correcting both common and rare pathogenic mutations.
In Chapter 2, I describe the development and exploratory data analysis of pooled lentiviral screens for rapidly assessing the outcomes of prime editing experiments. First, I detail the design and construction of paired gRNA–target site libraries for high-throughput evaluation of editing efficiencies for both pegRNAs and nsgRNAs. Then, I use the results from paired pegRNA–target site screens to characterize the sequence determinants of mammalian mismatch repair. I show that mismatch repair efficiency depends on both the specific mismatched bases as well as the length of uninterrupted mismatches. Using data from paired nsgRNA–target site screens, I show that prime editing efficiency with PE3 systems is not correlated with predicted Cas9-nuclease efficiency scores, motivating the development of predictive machine learning models specific for complementary-strand nicking.
In Chapter 3, I formulate mechanistic machine learning, a paradigm for performing machine learning on chemical systems wherein domain knowledge about reaction mechanisms can be directly incorporated into the underlying structure of data-driven models. Using mechanistic machine learning, I describe the development of OptiPrime, a model of prime editing efficiency and show that its exquisite predictive performance is dependent on its mechanistic formulation. Additionally, I show that the intermediate values computed by OptiPrime are physically interpretable and can be used for accurate predictions of outcomes of prime editing experiments with complementary strand nicking guides (i.e., PE3) and with paired prime editing guide RNAs (i.e., twinPE).
Next, in Chapter 4, I demonstrate several prospective use-cases of OptiPrime towards the development of therapeutic approaches for correcting pathogenic mutations in human and mouse models of disease. Using cystic fibrosis as a test case, I show that OptiPrime can be used to generate pegRNA sequences that achieve high editing efficiencies at three common pathogenic mutations in CFTR, including one that resulted in double the editing efficiency of a pegRNA that required 3 years to hand-optimize. I then show that OptiPrime-generated sequences can be used directly in primary cells for correction of pathogenic mutations in mouse models of Alport syndrome and KIF1A-associated neurological disorder. Moreover, I show several "nonconventional" use cases for OptiPrime, including for T cell engineering in primary human cells, generating a pair of pegRNAs capable of installing a recombinase landing site that enabled over 10\% integration efficiency into CFTR intron 1, and combining OptiPrime with SpliceAI to correct a cause of HLA class II immunodeficiency.
In Chapter 5, I describe the development and application of powTNRka, a dynamic programming algorithm for assessing the outcomes of base and prime editing experiments at highly repetitive genomic loci. PowTNRka enabled the development of base editing and prime editing strategies in the trinucleotide repeat tracts of HTT and FXN, the genes associated with Huntington’s disease and Friedreich’s ataxia, respectively. Base editing was able to abate somatic repeat expansion in HTT and FXN in both in vitro and in vivo models, providing a potential strategy for preventing repeats from reaching pathogenic length. Moreover, prime editing was able to precisely excise repeats at HTT and FXN in models that contained pathogenic numbers of trinucleotide repeats. In an in vivo model of Friedreich’s ataxia, prime editing-mediated repeat excision resulted in successful restoration of FXN transcript levels.
Lastly, in Chapter 6, I provide a brief outlook on the state of current research at the intersection of computation and genome editing technologies, along with future research directions that will further pave the path for the field’s continued development.Chemistry and Chemical Biolog
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.