982 research outputs found

    Computational Fitness Landscape for All Gene-Order Permutations of an RNA Virus

    Get PDF
    How does the growth of a virus depend on the linear arrangement of genes in its genome? Answering this question may enhance our basic understanding of virus evolution and advance applications of viruses as live attenuated vaccines, gene-therapy vectors, or anti-tumor therapeutics. We used a mathematical model for vesicular stomatitis virus (VSV), a prototype RNA virus that encodes five genes (N-P-M-G-L), to simulate the intracellular growth of all 120 possible gene-order variants. Simulated yields of virus infection varied by 6,000-fold and were found to be most sensitive to gene-order permutations that increased levels of the L gene transcript or reduced levels of the N gene transcript, the lowest and highest expressed genes of the wild-type virus, respectively. Effects of gene order on virus growth also depended upon the host-cell environment, reflecting different resources for protein synthesis and different cell susceptibilities to infection. Moreover, by computationally deleting intergenic attenuations, which define a key mechanism of transcriptional regulation in VSV, the variation in growth associated with the 120 gene-order variants was drastically narrowed from 6,000- to 20-fold, and many variants produced higher progeny yields than wild-type. These results suggest that regulation by intergenic attenuation preceded or co-evolved with the fixation of the wild type gene order in the evolution of VSV. In summary, our models have begun to reveal how gene functions, gene regulation, and genomic organization of viruses interact with their host environments to define processes of viral growth and evolution

    MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

    Get PDF
    Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods

    Improving the Ribozyme Toolbox: From Structure-Function Insights to Synthetic Biology Applications

    Get PDF
    Self-cleaving ribozymes are a naturally occurring class of catalytically active RNA molecules which cleave their own phosphate backbone. In nature, self-cleaving ribozymes are best known for their role in processing concatamers of viral genomes into monomers during viral replication in some RNA viruses, but to a lesser degree have also been implicated in mRNA regulation and processing in bacteria and eukaryotes. In addition to their biological relevance, these RNA enzymes have been harnessed as important biomolecular tools with a variety of applications in fields such as bioengineering. Self-cleaving ribozymes are relatively small and easy to generate in the lab using common molecular biology approaches, and have therefore been accessible and well exploited model systems used to interrogate RNA sequence-structure-function relationships. Furthermore, self-cleaving ribozymes are also being implemented as parts in the development of various biomolecular tools such as biosensors and gene regulatory elements. While much progress has been made in these areas, there are still challenges associated with the performance and implementation of such tools. The work contained in this dissertation aims to address several of these challenges and improve the ribozyme toolbox in several diverse areas. Chapter one provides an introduction to pertinent background information for this dissertation. Chapter two aims to improve the ribozyme toolbox by providing and analyzing new high-throughput sequence-structure-function data sets on five different self-cleaving ribozymes, and identifying how trends in epistasis relate to distinct structural elements. Chapter three uses such high-throughput data to train machine learning models that accurately predict the historically difficult to predict functional effects of higher order mutations in functional RNA’s. Finally, in chapter four, I developed a biologically relevant platform to study the real time performance and kinetics of self-cleaving ribozyme-based gene regulatory elements directly at the site of transcription in mammalian cells

    On the networked architecture of genotype spaces and its critical effects on molecular evolution

    Get PDF
    Evolutionary dynamics is often viewed as a subtle process of change accumulation that causes a divergence among organisms and their genomes. However, this interpretation is an inheritance of a gradualistic view that has been challenged at the macroevolutionary, ecological and molecular level. Actually, when the complex architecture of genotype spaces is taken into account, the evolutionary dynamics of molecular populations becomes intrinsically non-uniform, sharing deep qualitative and quantitative similarities with slowly driven physical systems: nonlinear responses analogous to critical transitions, sudden state changes or hysteresis, among others. Furthermore, the phenotypic plasticity inherent to genotypes transforms classical fitness landscapes into multiscapes where adaptation in response to an environmental change may be very fast. The quantitative nature of adaptive molecular processes is deeply dependent on a network-of-networks multilayered structure of the map from genotype to function that we begin to unveil.This work has been supported by the Spanish Ministerio de Economía y Competitividad and FEDER funds of the EU through grants ViralESS (FIS2014-57686-P) and VARIANCE (FIS2015-64349-P). J.A. is supported through grant no. SEV-2013-0347. P.C. is supported through the European Union's YEI funds

    Algorithms for Analysis of Heterogeneous Cancer and Viral Populations Using High-Throughput Sequencing Data

    Get PDF
    Next-generation sequencing (NGS) technologies experienced giant leaps in recent years. Short read samples reach millions of reads, and the number of samples has been growing enormously in the wake of the COVID-19 pandemic. This data can expose essential aspects of disease transmission and development and reveal the key to its treatment. At the same time, single-cell sequencing saw the progress of getting from dozens to tens of thousands of cells per sample. These technological advances bring new challenges for computational biology and require the development of scalable, robust methods to deal with a wide range of problems varying from epidemiology to cancer studies. The first part of this work is focused on processing virus NGS data. It proposes algorithms that can facilitate the initial data analysis steps by filtering genetically related sequencing and the tool investigating intra-host virus diversity vital for biomedical research and epidemiology. The second part addresses single-cell data in cancer studies. It develops evolutionary cancer models involving new quantitative parameters of cancer subclones to understand the underlying processes of cancer development better

    Improved Computational Prediction of Function and Structural Representation of Self-Cleaving Ribozymes with Enhanced Parameter Selection and Library Design

    Get PDF
    Biomolecules could be engineered to solve many societal challenges, including disease diagnosis and treatment, environmental sustainability, and food security. However, our limited understanding of how mutational variants alter molecular structures and functional performance has constrained the potential of important technological advances, such as high-throughput sequencing and gene editing. Ribonuleic Acid (RNA) sequences are thought to play a central role within many of these challenges. Their continual discovery throughout all domains of life is evidence of their significant biological importance (Weinreb et al., 2016). The self-cleaving ribozyme is a class of noncoding Ribonuleic Acid (ncRNA) that has been useful for relating sequence variants to structural features and their associated catalytic activities. Self-cleaving ribozymes possess tractable sequence spaces, perform easily identifiable catalytic functions, and have well documented structures. The determination of a self-cleaving ribozyme’s structure and catalytic activity within the laboratory is typically a slow and expensive process. Most current explorations of structure and function come from these empirical processes. Computational approaches to the prediction of catalytic activity and structure are fast and inexpensive, but have failed both to achieve atomic accuracy or to correctly identify all base-pair interactions (Watkins et al., 2018). One prominent impediment to computational approaches is the lack of existing structural and functional data typically required by predictive models (Jumper et al., 2021). Using data from deep-mutational scanning experiments and high-throughput sequencing technology, it is possible to computationally map mutational variants to their observed catalytic activity for a range of self-cleaving ribozymes. The resulting map reveals important base-pairing relationships that, in turn, facilitate accurate predictions of higher-order variants. Using sequence data from three experimental replicates of five model self-cleaving ribozymes, I will identify and map all single and double mutation variants to their observed cleavage activity. These mappings will be used to identify structural features within each ribozyme. Next, I will show within a training tool how observed cleavage for multiple reaction times can be used to identify the catalytic rates of our model ribozymes. Finally, I will predict the functional activity for model ribozyme variants of various mutational orders using machine learning models trained only on functionally labeled sequence variants. Together, these three dissertation chapters represent the kind of analysis needed to further the implementation of more accurate structural and functional prediction algorithms

    Prevalence of Epistasis in the Evolution of Influenza A Surface Proteins

    Get PDF
    The surface proteins of human influenza A viruses experience positive selection to escape both human immunity and, more recently, antiviral drug treatments. In bacteria and viruses, immune-escape and drug-resistant phenotypes often appear through a combination of several mutations that have epistatic effects on pathogen fitness. However, the extent and structure of epistasis in influenza viral proteins have not been systematically investigated. Here, we develop a novel statistical method to detect positive epistasis between pairs of sites in a protein, based on the observed temporal patterns of sequence evolution. The method rests on the simple idea that a substitution at one site should rapidly follow a substitution at another site if the sites are positively epistatic. We apply this method to the surface proteins hemagglutinin and neuraminidase of influenza A virus subtypes H3N2 and H1N1. Compared to a non-epistatic null distribution, we detect substantial amounts of epistasis and determine the identities of putatively epistatic pairs of sites. In particular, using sequence data alone, our method identifies epistatic interactions between specific sites in neuraminidase that have recently been demonstrated, in vitro, to confer resistance to the drug oseltamivir; these epistatic interactions are responsible for widespread drug resistance among H1N1 viruses circulating today. This experimental validation demonstrates the predictive power of our method to identify epistatic sites of importance for viral adaptation and public health. We conclude that epistasis plays a large role in shaping the molecular evolution of influenza viruses. In particular, sites with , which would normally not be identified as positively selected, can facilitate viral adaptation through epistatic interactions with their partner sites. The knowledge of specific interactions among sites in influenza proteins may help us to predict the course of antigenic evolution and, consequently, to select more appropriate vaccines and drugs

    Eco-Evolutionary Implications of Environmental Change Across Heterogeneous Landscapes

    Get PDF
    Species use a variety of mechanisms to adapt to environmental change. These range from spatially tracking optimal environments, to phenotypically plastic responses and evolutionary adaptation. Due to increases in anthropogenic influence on environments, characteristics of change such as their duration and magnitude are undergoing fundamental shifts away from the natural disturbance regimes that shaped species’ evolution. This dissertation uses empirical data and simulation models to examine the ecological and evolutionary consequences of environmental change across real, heterogeneous landscapes for multiple species, with an emphasis on anthropogenic changes. I used landscape genetics to evaluate the effects of urbanization on two native amphibian species, spotted salamanders (Ambystoma maculatum) and wood frogs (Lithobates sylvaticus). Population isolation was positively associated with local urbanization and lessened genetic diversity for both species. Resistance surface modelling revealed connectivity was diminished by developed land cover, light roads, interstates, and topography for both species, plus secondary roads and rivers for wood frogs, highlighting the influence of anthropogenic landscape features relative to natural features. Further study of a subset of wood frog populations revealed adaptive evolution associated with urban environments. I identified a set of 37 loci with the capacity to correctly reassign individuals into rural or urban populations with 87.5 and 93.8% accuracy, respectively. I developed an agent-based model to examine how gene flow, rates of change, and strength of landscape spatial and temporal autocorrelation influence abundance outcomes for species experiencing an environmental shift. Analysis of 36 environmental scenarios suggests that environmental variation, which is an emergent property of landscape autocorrelation, is negatively associated with the magnitude and duration of abundance declines following environmental change. Higher levels of gene flow lessened this effect, particularly in abrupt change scenarios, although gradual changes also resulted in demographic costs. Lastly, I used an investigation of an emerging disease in American lobsters (Homarus americanus) to study within-generation responses to environmental pressures. Using whole transcriptome shotgun sequencing I identified eight differentially expressed unigenes associated with the disease and seven related to environmental differences. Collectively, my dissertation provides numerous examples of how anthropogenically induced environmental change can direct ecological and evolutionary processes
    • …
    corecore