thesis

Statistics and Evolution of Functional Genomic Sequence

Abstract

In this thesis, three separate problems of genomics are addressed, utilizing methods related to the field of statistical mechanics. The goal of the project discussed in the first chapter is the elucidation of post-transcriptional gene regulation imposed by microRNAs, a recently discovered class of tiny non-coding RNAs. A probabilistic algorithm for the computational identification of genes regulated by microRNAs is introduced, which was developed based on experimental data and statistical analysis of whole genome data. In particular, the application of this algorithm to multiple-alignments of groups of related species allows for the specific and sensitive detection of genes targeted by microRNAs on a genome-wide level. Examination of clade-specific predictions and cross-clade comparison yields deeper insights into microRNA biology and first clues about long-term evolution of microRNA regulation, which are discussed in detail. Modeling evolutionary dynamics of microsatellites, an abundant class of repetitive sequence in eukaryotic genomes, was the objective of the second project and is discussed in chapter two. Inspired by the putative functionality of some of these elements and the difficulty of constructing correct sequence alignments that reflect the evolutionary relationships between microsatellites, a neutral model for microsatellite evolution is developed and tested in the fruit fly Drosophila melanogaster by comparing evolutionary rates predicted by the model to independent measurements of these rates from multiple alignments of three closely relates Drosophila species. The model is applied separately to genomic sequence categories of different functional annotations in order to assess the varying influence of selective constraint among these categories. In the last chapter, a general population genetic model is introduced that allows for the determination of transcription factor binding site stability as a function of selection strength, mutation rate and effective population size at arbitrary values of these parameters. The analytical solution of this model indicates the probability of a binding site to be functional. The model is used to compute the population fraction of functional binding sites at fixed selection pressure across a variety of different taxa. The results lead to the conclusion that a decreasing effective population size, such as observed at the evolutionary transition from prokaryotes to eukaryotes, could result in loss of binding site stability. An extension to our model serves us to assess the compensatory effect of the emergence of multiple binding sites for the same transcription factor in order to maintain the existing regulatory relationship

    Similar works