1,276 research outputs found
The Capacity of Some P\'olya String Models
We study random string-duplication systems, which we call P\'olya string
models. These are motivated by DNA storage in living organisms, and certain
random mutation processes that affect their genome. Unlike previous works that
study the combinatorial capacity of string-duplication systems, or various
string statistics, this work provides exact capacity or bounds on it, for
several probabilistic models. In particular, we study the capacity of noisy
string-duplication systems, including the tandem-duplication, end-duplication,
and interspersed-duplication systems. Interesting connections are drawn between
some systems and the signature of random permutations, as well as to the beta
distribution common in population genetics
Formation of regulatory modules by local sequence duplication
Turnover of regulatory sequence and function is an important part of
molecular evolution. But what are the modes of sequence evolution leading to
rapid formation and loss of regulatory sites? Here, we show that a large
fraction of neighboring transcription factor binding sites in the fly genome
have formed from a common sequence origin by local duplications. This mode of
evolution is found to produce regulatory information: duplications can seed new
sites in the neighborhood of existing sites. Duplicate seeds evolve
subsequently by point mutations, often towards binding a different factor than
their ancestral neighbor sites. These results are based on a statistical
analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome,
and a comparison set of intergenic regulatory sequence in Saccharomyces
cerevisiae. In fly regulatory modules, pairs of binding sites show
significantly enhanced sequence similarity up to distances of about 50 bp. We
analyze these data in terms of an evolutionary model with two distinct modes of
site formation: (i) evolution from independent sequence origin and (ii)
divergent evolution following duplication of a common ancestor sequence. Our
results suggest that pervasive formation of binding sites by local sequence
duplications distinguishes the complex regulatory architecture of higher
eukaryotes from the simpler architecture of unicellular organisms
The capacity of some Pólya string models
We study random string-duplication systems, called Pólya string models, motivated by certain random mutation processes in the genome of living organisms. Unlike previous works that study the combinatorial capacity of string-duplication systems, or peripheral properties such as symbol frequency, this work provides exact capacity or bounds on it, for several probabilistic models. In particular, we give the exact capacity of the random tandem-duplication system, and the end-duplication system, and bound the capacity of the complement tandem-duplication system. Interesting connections are drawn between the former and the beta distribution common to population genetics, as well as between the latter system and signatures of random permutations
Evolution of new regulatory functions on biophysically realistic fitness landscapes
Regulatory networks consist of interacting molecules with a high degree of
mutual chemical specificity. How can these molecules evolve when their function
depends on maintenance of interactions with cognate partners and simultaneous
avoidance of deleterious "crosstalk" with non-cognate molecules? Although
physical models of molecular interactions provide a framework in which
co-evolution of network components can be analyzed, most theoretical studies
have focused on the evolution of individual alleles, neglecting the network. In
contrast, we study the elementary step in the evolution of gene regulatory
networks: duplication of a transcription factor followed by selection for TFs
to specialize their inputs as well as the regulation of their downstream genes.
We show how to coarse grain the complete, biophysically realistic
genotype-phenotype map for this process into macroscopic functional outcomes
and quantify the probability of attaining each. We determine which evolutionary
and biophysical parameters bias evolutionary trajectories towards fast
emergence of new functions and show that this can be greatly facilitated by the
availability of "promiscuity-promoting" mutations that affect TF specificity
Adaptive evolution of transcription factor binding sites
The regulation of a gene depends on the binding of transcription factors to
specific sites located in the regulatory region of the gene. The generation of
these binding sites and of cooperativity between them are essential building
blocks in the evolution of complex regulatory networks. We study a theoretical
model for the sequence evolution of binding sites by point mutations. The
approach is based on biophysical models for the binding of transcription
factors to DNA. Hence we derive empirically grounded fitness landscapes, which
enter a population genetics model including mutations, genetic drift, and
selection. We show that the selection for factor binding generically leads to
specific correlations between nucleotide frequencies at different positions of
a binding site. We demonstrate the possibility of rapid adaptive evolution
generating a new binding site for a given transcription factor by point
mutations. The evolutionary time required is estimated in terms of the neutral
(background) mutation rate, the selection coefficient, and the effective
population size. The efficiency of binding site formation is seen to depend on
two joint conditions: the binding site motif must be short enough and the
promoter region must be long enough. These constraints on promoter architecture
are indeed seen in eukaryotic systems. Furthermore, we analyse the adaptive
evolution of genetic switches and of signal integration through binding
cooperativity between different sites. Experimental tests of this picture
involving the statistics of polymorphisms and phylogenies of sites are
discussed.Comment: published versio
Evolution of regulatory complexes: a many-body system
The recent advent of large-scale genomic sequence data and improvement of sequencing technologies has enabled population genetics to advance from a mostly abstract theoretical basis to a quantitative molecular description. However, functional units in DNA are typically combinations of interacting nucleotide segments, and evolutionary forces acting on these segments can result in very complicated population dynamics. The goal is to formulate these interactions in such a way that the macroscopic features are independent of the microscopic details, as in statistical mechanics.
In this thesis, I discuss the evolutionary dynamics of regulatory sequences, which control the production of protein in cells. One of the primary forms of regulation occurs through interactions of proteins called transcription factors, with binding sites in the DNA sequence, and the strength of these interactions influence the individual's fitness in the population. What makes this an ideal model system for quantitative analysis of genomic evolution, is the possibility of inferring this relationship.
Compared to prokaryotes and yeast, gene regulation is much more complex in higher eukaryotes. Regulatory information is organized in modules with multiple binding sites that are linked to a common function. In Chapter. 2, we show that binding site complexes are commonly formed by local sequence duplications, as opposed to forming from scratch by single point mutations. We also show that the underlying regulatory grammar is in tune with this mechanism such that the duplication events confer an adaptive advantage.
Regulatory complexes resemble a many-particle system whose function emerges from the collective dynamics of its elements. In Chapter. 3, we develop a thermodynamic framework to characterize the effective affinity of site complexes to multiple transcription factors with cooperative binding. These affinities are the phenotype, or trait of binding complexes on which selection acts, and we characterize their evolution. From the yeast genome polymorphism data, we infer a fitness landscape as a function of binding affinity by using the novel method developed in Chapter.~ 4. This method of quantitative trait analysis can deal with long-range correlations between sites which arise in asexual populations. Our fitness landscape quantitatively predicts the amount of conservation of the phenotype, as well as the amount of compensatory changes between sites.
Our results open a new avenue to understand the regulatory "grammar" of eukaryotic genomes based on quantitative evolution models. They prove that a combination of theoretical models, high-throughput experimental measurements, and analysis of genomic variation is necessary for a proper quantitative understanding of biological systems
From in vitro evolution to protein structure
In the nanoscale, the machinery of life is mainly composed by macromolecules and macromolecular complexes that through their shapes create a network of interconnected mechanisms of biological processes. The relationship between shape and function of a biological molecule is the foundation of structural biology, that aims at studying the structure of a protein or a macromolecular complex to unveil the molecular mechanism through which it exerts its function. What about the reverse: is it possible by exploiting the function for which a protein was naturally selected to deduce the protein structure? To this aim we developed a method, called CAMELS (Coupling Analysis by Molecular Evolution Library Sequencing), able to obtain the structural features of a protein from an artificial selection based on that protein function. With CAMELS we tried to reconstruct the TEM-1 beta lactamase fold exclusively by generating and sequencing large libraries of mutational variants. Theoretically with this method it is possible to reconstruct the structure of a protein regardless of the species of origin or the phylogenetical time of emergence when a functional phenotypic selection of a protein is available. CAMELS allows us to obtain protein structures without needing to purify the protein beforehand
- …