8 research outputs found

    Boosting forward-time population genetic simulators through genotype compression

    Get PDF
    Background: Forward-time population genetic simulations play a central role in deriving and testing evolutionary hypotheses. Such simulations may be data-intensive, depending on the settings to the various param- eters controlling them. In particular, for certain settings, the data footprint may quickly exceed the memory of a single compute node. Results: We develop a novel and general method for addressing the memory issue inherent in forward-time simulations by compressing and decompressing, in real-time, active and ancestral genotypes, while carefully accounting for the time overhead. We propose a general graph data structure for compressing the genotype space explored during a simulation run, along with efficient algorithms for constructing and updating compressed genotypes which support both mutation and recombination. We tested the performance of our method in very large-scale simulations. Results show that our method not only scales well, but that it also overcomes memory issues that would cripple existing tools. Conclusions: As evolutionary analyses are being increasingly performed on genomes, pathways, and networks, particularly in the era of systems biology, scaling population genetic simulators to handle large-scale simulations is crucial. We believe our method offers a significant step in that direction. Further, the techniques we provide are generic and can be integrated with existing population genetic simulators to boost their performance in terms of memory usage

    ncDNA and drift drive binding site accumulation

    Get PDF
    Background: The amount of transcription factor binding sites (TFBS) in an organism's genome positively correlates with the complexity of the regulatory network of the organism. However, the manner by which TFBS arise and accumulate in genomes and the effects of regulatory network complexity on the organism's fitness are far from being known. The availability of TFBS data from many organisms provides an opportunity to explore these issues, particularly from an evolutionary perspective. Results: We analyzed TFBS data from five model organisms -- E. coli K12, S. cerevisiae, C. elegans, D. melanogaster, A. thaliana -- and found a positive correlation between the amount of non-coding DNA (ncDNA) in the organismメs genome and regulatory complexity. Based on this finding, we hypothesize that the amount of ncDNA, combined with the population size, can explain the patterns of regulatory complexity across organisms. To test this hypothesis, we devised a genome-based regulatory pathway model and subjected it to the forces of evolution through population genetic simulations. The results support our hypothesis, showing neutral evolutionary forces alone can explain TFBS patterns, and that selection on the regulatory network function does not alter this finding. Conclusions: The cis-regulome is not a clean functional network crafted by adaptive forces alone, but instead a data source filled with the noise of non-adaptive forces. From a regulatory perspective, this evolutionary noise manifests as complexity on both the binding site and pathway level, which has significant implications on many directions in microbiology, genetics, and synthetic biology

    A Sequence-Based, Population Genetic Model of Regulatory Pathway Evolution

    Get PDF
    Complex phenotypes with genetic cause are understood through many processes, including regulatory pathways, but our evolutionary understanding of these critical structures is undermined by poor models which fail to preserve the underlying sequence structure and to incorporate population genetics. In response, this thesis builds a pathway model of evolution from its underlying sequence structure and validates it against a pertinent problem in genome evolution which uniquely leverage the developed model. Specifically, my model preserves sequence characteristics through a novel data structure and pathway-level mutation and recombination rates which are functions of sequence properties. The utility of the model is validated with a study quantifying the advantages and disadvantages of expansive non-coding DNA regions on the establishment of optimal pathways. Because the model presented in this thesis rectifies many fundamental problems in previous models, it may serve as a critical tool for future work in pathway evolution

    Population Regulomics: Applying population genetics to the cis-regulome

    No full text
    Population genetics provides a mathematical and computational framework for understanding and modeling evolutionary processes, and so it is vital for the investigation of biological systems. In its current state, molecular population genetics is exclusively focused on molecular sequences (DNA, RNA, or amino acid sequences), where all application-ready simulators and analytic measures work only on sequence data. Consequently, in the early 2000s, when technologies became available to sequence entire genomes, population genetic approaches were naturally applied to mine out signatures of selection and conservation, resulting in the subfi eld of population genomics. Nearly every present genome project applies population genomic techniques to identify functional information and genome structure. Recent technologies have ushered in a similar wave of genetic information, this time focusing on biological mechanisms operating above the genome, most notably on gene regulation (regulatory networks). In this work, I develop a molecular population genetics approach for gene regulation, called population regulomics, which includes simulators and analytic measurements that operate on populations of regulatory networks. I conducted extensive data analyses to connect the genome with the cis-regulome, developed computationally effi cient simulators, and adapted population genetic measurements on sequence to the regulatory network. By connecting genomic information to cis-regulation, we may apply the wealth of knowledge at the genome level to observed patterns at the regulatory level with unknown evolutionary origin. I demonstrate that by applying population regulomics to the E. coli cis-regulatory network, for the rst time we are able to quantify the evolutionary origins of topological patterns and reveal the surprising amount of neutral signal in the bacterial cis-regulome. Since regulatory networks play a central role in cellular functioning and, consequently, organismal fitness, this new sub-fi eld of population regulomics promises to shed the light of evolution on regulatory mechanisms and, more broadly, on the genetic mechanisms underlying the various phenotypes

    Indirect and suboptimal control of gene expression is widespread in bacteria

    No full text
    Gene regulation in bacteria is usually described as an adaptive response to an environmental change so that genes are expressed when they are required. We instead propose that most genes are under indirect control: their expression responds to signal(s) that are not directly related to the genes' function. Indirect control should perform poorly in artificial conditions, and we show that gene regulation is often maladaptive in the laboratory. In Shewanella oneidensis MR-1, 24% of genes are detrimental to fitness in some conditions, and detrimental genes tend to be highly expressed instead of being repressed when not needed. In diverse bacteria, there is little correlation between when genes are important for optimal growth or fitness and when those genes are upregulated. Two common types of indirect control are constitutive expression and regulation by growth rate; these occur for genes with diverse functions and often seem to be suboptimal. Because genes that have closely related functions can have dissimilar expression patterns, regulation may be suboptimal in the wild as well as in the laboratory
    corecore