338 research outputs found

    Machine Learning Guided Exploration of an Empirical Ribozyme Fitness Landscape

    Get PDF
    Okinawa Institute of Science and Technology Graduate UniversityDoctor of PhilosophyFitness landscape of a biomolecule is a representation of its activity as a function of its sequence. Properties of a fitness landscape determine how evolution proceeds. Therefore, the distribution of functional variants and more importantly, the connectivity of these variants within the sequence space are important scientific questions. Exploration of these spaces, however, is impeded by the combinatorial explosion of the sequence space. High-throughput experimental methods have recently reduced this impediment but only modestly. Better computational methods are needed to fully utilize the rich information from these experimental data to better understand the properties of the fitness landscape. In this work, I seek to improve this exploration process by combining data from massively parallel experimental assay with smart library design using advanced computational techniques. I focus on an artificial RNA enzyme or ribozyme that can catalyze a ligation reaction between two RNA fragments. This chemistry is analogous to that of the modern RNA polymeraseenzymes, therefore, represents an important reaction in the origin of life. In the first chapter, I discuss the background to this work in the context of evolutionary theory of fitness landscape and its implications in biotechnology. In chapter 2, I explore the use of processes borrowed from the field of evolutionary computation to solve optimization problems using real experimental sequence-activity data. In chapter 3, I investigate the use of supervised machine learning models to extract information on epistatic interactions from the dataset collected during multiple rounds of directed evolution. I investigate and experimentally validate the extent to which a deep learning model can be used to guide a completely computational evolutionary algorithm towards distant regions of the fitness landscape. In the final chapter, I perform a comprehensive experimental assay of the combinatorial region explored by the deep learning-guided evolutionary algorithm. Using this dataset, I analyze higher-order epistasis and attempt to explain the increased predictability of the region sampled by the algorithm. Finally, I provide the first experimental evidence of a large RNA ‘neutral network’. Altogether, this work represents the most comprehensive experimental and computational study of the RNA ligase ribozyme fitness landscape to date, providing important insights into the evolutionary search space possibly explored during the earliest stages of life.doctoral thesi

    Experimental exploration of a ribozyme neutral network using evolutionary algorithm and deep learning

    Get PDF
    A neutral network connects all genotypes with equivalent phenotypes in a fitness landscape and plays an important role in the mutational robustness and evolvability of biomolecules. In contrast to earlier theoretical works, evidence of large neutral networks has been lacking in recent experimental studies of fitness landscapes. This suggests that evolution could be constrained globally. Here, we demonstrate that a deep learning-guided evolutionary algorithm can efficiently identify neutral genotypes within the sequence space of an RNA ligase ribozyme. Furthermore, we measure the activities of all 216 variants connecting two active ribozymes that differ by 16 mutations and analyze mutational interactions (epistasis) up to the 16th order. We discover an extensive network of neutral paths linking the two genotypes and reveal that these paths might be predicted using only information from lower-order interactions. Our experimental evaluation of over 120,000 ribozyme sequences provides important empirical evidence that neutral networks can increase the accessibility and predictability of the fitness landscape

    SPARCS: a web server to analyze (un)structured regions in coding RNA sequences.

    Get PDF
    International audienceMore than a simple carrier of the genetic information, messenger RNA (mRNA) coding regions can also harbor functional elements that evolved to control different post-transcriptional processes, such as mRNA splicing, localization and translation. Functional elements in RNA molecules are often encoded by secondary structure elements. In this aticle, we introduce Structural Profile Assignment of RNA Coding Sequences (SPARCS), an efficient method to analyze the (secondary) structure profile of protein-coding regions in mRNAs. First, we develop a novel algorithm that enables us to sample uniformly the sequence landscape preserving the dinucleotide frequency and the encoded amino acid sequence of the input mRNA. Then, we use this algorithm to generate a set of artificial sequences that is used to estimate the Z-score of classical structural metrics such as the sum of base pairing probabilities and the base pairing entropy. Finally, we use these metrics to predict structured and unstructured regions in the input mRNA sequence. We applied our methods to study the structural profile of the ASH1 genes and recovered key structural elements. A web server implementing this discovery pipeline is available at http://csb.cs.mcgill.ca/sparcs together with the source code of the sampling algorithm

    corRna: a web server for predicting multiple-point deleterious mutations in structural RNAs

    Get PDF
    RNA molecules can achieve a broad range of regulatory functions through specific structures that are in turn determined by their sequence. The prediction of mutations changing the structural properties of RNA sequences (a.k.a. deleterious mutations) is therefore useful for conducting mutagenesis experiments and synthetic biology applications. While brute force approaches can be used to analyze single-point mutations, this strategy does not scale well to multiple mutations. In this article, we present corRna a web server for predicting the multiple-point deleterious mutations in structural RNAs. corRna uses our RNAmutants framework to efficiently explore the RNA mutational landscape. It also enables users to apply search heuristics to improve the quality of the predictions. We show that corRna predictions correlate with mutagenesis experiments on the hepatitis C virus cis-acting replication element as well as match the accuracy of previous approaches on a large test-set in a much lower execution time. We illustrate these new perspectives offered by corRna by predicting five-point deleterious mutations—an insight that could not be achieved by previous methods. corRna is available at: http://corrna.cs.mcgill.ca

    A global sampling approach to designing and reengineering RNA secondary structures

    Get PDF
    The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.National Science Foundation (U.S.). Graduate Research Fellowship ProgramNatural Sciences and Engineering Research Council of Canada (NSERC) (RGPIN ) (386596-10)Fonds québécois de la recherche sur la nature et les technologies (PR-146375)National Institutes of Health (U.S.) (Grant GM081871)Natural Sciences and Engineering Research Council of Canada (NSERC)National Institutes of Health (U.S.

    Artificial evolution with Binary Decision Diagrams: a study in evolvability in neutral spaces

    Get PDF
    This thesis develops a new approach to evolving Binary Decision Diagrams, and uses it to study evolvability issues. For reasons that are not yet fully understood, current approaches to artificial evolution fail to exhibit the evolvability so readily exhibited in nature. To be able to apply evolvability to artificial evolution the field must first understand and characterise it; this will then lead to systems which are much more capable than they are currently. An experimental approach is taken. Carefully crafted, controlled experiments elucidate the mechanisms and properties that facilitate evolvability, focusing on the roles and interplay between neutrality, modularity, gradualism, robustness and diversity. Evolvability is found to emerge under gradual evolution as a biased distribution of functionality within the genotype-phenotype map, which serves to direct phenotypic variation. Neutrality facilitates fitness-conserving exploration, completely alleviating local optima. Population diversity, in conjunction with neutrality, is shown to facilitate the evolution of evolvability. The search is robust, scalable, and insensitive to the absence of initial diversity. The thesis concludes that gradual evolution in a search space that is free of local optima by way of neutrality can be a viable alternative to problematic evolution on multi-modal landscapes

    The evolution of the bacterial chemotaxis network

    Get PDF

    The evolution of the bacterial chemotaxis network

    Get PDF
    Advances in biomolecular technology allow us to sequence entire genomes, but how genes and molecular networks influence the emergence and evolution of phenotypic traits is still unclear. Different fields in biology and medicine are working hard to unravel the relationship between the genome and phenotypes. In this thesis, a new (mechanistic) approach combining systems biology and evolutionary biology is explored to tackle the genotype-phenotype problem. The chemotaxis network of Escherichia coli is used as a model system for its relatively simple network configuration associated with a complex trait such as chemotactic performance. A mathematical model was developed and in silico evolutionary experiments were performed with different environmental conditions. The results show that due to the complexity of the genomic architecture, most individual gene loci have an inconsistent relationship with fitness. In other words, direct relationships between genes and phenotypes are far more complex than just a linear correlation. The reconstruction of the fitness landscape shows that its structure is highly heterogeneous and there are cases in which mutations have unpredictable and inconsistent effects. Another result shows that contrary to static environments, fluctuating environments facilitate the exploration of the fitness landscape. The results in this thesis show the potential of the evolutionary-systems-biology approach, which could help to understand how complex diseases (e.g. cancer or diabetes) develop or how bacteria evolve to become drug resistant

    Investigating Evolutionary Innovation in Yeast Heat Shock Protein 90

    Get PDF
    The Heat Shock Protein 90 (Hsp90) is an essential and highly conserved chaperone that facilitates the maturation of a wide array of client proteins, including many kinases. These clients in turn regulate a wide array of cellular processes, such as signal transduction, and transcriptional reprogramming. As a result, the activity of Hsp90 has the potential to influence physiology, which in turn may influence the ability to adapt to new environments. Previous studies using a deep mutational scanning approach, (EMPIRIC) identified multiple substitutions within a 9 amino acid substrate-binding loop of yeast Hsp90 that provides a growth advantage for yeast under elevated salinity conditions and costs of adaptation under alternate environments. These results demonstrate that genetic alterations to a small region of Hsp90 can contribute to evolutionary change and promote adaptation to specific environments. However, because Hsp90 is a large, highly dynamic and multi-functional protein the adaptive potential and evolutionary constraints of Hsp90 across diverse environments requires further investigation. In this dissertation I used a modified version of EMPIRIC to examine the impact of environmental stress on the adaptive potential, costs and evolutionary constraints for a 118 amino acid functional region of the middle domain of yeast Hsp90 under endogenous expression levels and the entire Hsp90 protein sequence under low expression levels. Endogenous Hsp90 expression levels were used to observe how environment may affect Hsp90 mutant fitness effects in nature, while low expression levels were used as a sensitive readout of Hsp90 function and fitness. In general, I found that mutations within the middle domain of Hsp90 have similar fitness effects across many environments, whereas, under low Hsp90 expression I found that the fitness effects of Hsp90 mutants differed between environments. Under individual conditions multiple variants provided a growth advantage, however these variants exhibited growth defects in other environments, indicating costs of adaptation. When comparing experimental results to 261 extant eukaryotic sequences I find that natural variants of Hsp90 support growth in all environments. I identified protein regions that are enriched in beneficial, deleterious and costly mutations that coincides with residues involved in co-chaperone-client-binding interactions, stabilization of Hsp90 client-binding interfaces, stabilization of Hsp90 interdomains and ATPase chaperone activity. In summary, this thesis uncovers the adaptive potential, costs of adaptation and evolutionary constraints of Hsp90 mutations across several environments. These results complement and extend known structural and functional information, highlighting potential adaptive mechanisms. Furthermore, this work elucidates the impact environment can have on shaping Hsp90 evolution and suggests that fluctuating environments may have played a role in the long-term evolution of Hsp90
    corecore