17 research outputs found

    Parallel Genetic Ensemble Feature Selection

    Get PDF

    Heterogeneous Genomic Molecular Clocks in Primates

    Get PDF
    Copyright: © 2006 Kim et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.DOI: 10.1371/journal.pgen.0020163Using data from primates, we show that molecular clocks in sites that have been part of a CpG dinucleotide in recent past (CpG sites) and non-CpG sites are of markedly different nature, reflecting differences in their molecular origins. Notably, single nucleotide substitutions at non-CpG sites show clear generation-time dependency, indicating that most of these substitutions occur by errors during DNA replication. On the other hand, substitutions at CpG sites occur relatively constantly over time, as expected from their primary origin due to methylation. Therefore, molecular clocks are heterogeneous even within a genome. Furthermore, we propose that varying frequencies of CpG dinucleotides in different genomic regions may have contributed significantly to conflicting earlier results on rate constancy of mammalian molecular clock. Our conclusion that different regions of genomes follow different molecular clocks should be considered when inferring divergence times using molecular data and in phylogenetic analysis

    Mutations of Different Molecular Origins Exhibit Contrasting Patterns of Regional Substitution Rate Variation

    Get PDF
    Transitions at CpG dinucleotides, referred to as “CpG substitutions”, are a major mutational input into vertebrate genomes and a leading cause of human genetic disease. The prevalence of CpG substitutions is due to their mutational origin, which is dependent on DNA methylation. In comparison, other single nucleotide substitutions (for example those occurring at GpC dinucleotides) mainly arise from errors during DNA replication. Here we analyzed high quality BAC-based data from human, chimpanzee, and baboon to investigate regional variation of CpG substitution rates

    Evolutionary impacts of DNA methylation on vertebrate genomes

    Get PDF
    DNA methylation is an epigenetic modification in which a methyl group is covalently added to the DNA. In vertebrate genomes methylation occurs almost exclusively at cytosines immediately followed by a guanine (CpG dinucleotides). Two important aspects of DNA methylation have inspired several recent scientific investigations including those in this dissertation. First, methylated cytosines are hotspots of point mutation due to a methylation-dependent mutation mechanism, which has caused a deficiency of CpGs in vertebrate genomes. Second, DNA methylation in promoters is linked with transcriptional silencing of the associated genes. This dissertation presents the results of four studies in which I investigated the impacts of DNA methylation on the neutral and functional evolution of vertebrate genomes. The results of the first two studies demonstrate that DNA methylation has profound impacts on both inter- and intra-genomic neutral substitution rate variation. The third and fourth studies demonstrate that DNA methylation has played critical roles in shaping the evolution of vertebrate promoters and gene regulation.Ph.D.Committee Chair: Dr. Soojin Yi; Committee Member: Dr. Eric Vigoda; Committee Member: Dr. James Thomas; Committee Member: Dr. John McDonald; Committee Member: Dr. Kirill Lobachev; Committee Member: Dr. Michael Goodisma

    Functional Relevance of CpG Island Length for Regulation of Gene Expression

    No full text
    CpG islands mark CpG-enriched regions in otherwise CpG-depleted vertebrate genomes. While the regulatory importance of CpG islands is widely accepted, it is little appreciated that CpG islands vary greatly in lengths. For example, CpG islands in the human genome vary ∼30-fold in their lengths. Here we report findings suggesting that the lengths of CpG islands have functional consequences. Specifically, we show that promoters associated with long CpG islands (long-CGI promoters) are distinct from other promoters. First, long-CGI promoters are uniquely associated with genes with an intermediate level of gene expression breadths. Notably, intermediate expression breadths require the most complex mode of gene regulation, from the standpoint of information content. Second, long-CGI promoters encode more RNA polymerase II (Polr2a) binding sites than other promoters. Third, the actual binding patterns of Polr2a occur in a more tissue-specific manner in long-CGI promoters compared to other CGI promoters. Moreover, long-CGI promoters contain the largest numbers of experimentally characterized transcription start sites compared to other promoters, and the types of transcription start sites in them are biased toward tissue-specific patterns of gene expression. Finally, long-CGI promoters are preferentially associated with genes involved in development and regulation. Together, these findings indicate that functionally relevant variations of CpG islands exist. By investigating consequences of certain CpG island traits, we can gain additional insights into the mechanism and evolution of regulatory complexity of gene expression

    Primate phylogenomics: developing numerous nuclear non-coding, non-repetitive markers for ecological and phylogenetic applications and analysis of evolutionary rate variation

    Get PDF
    © 2009 Peng et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/10/247DOI: 10.1186/1471-2164-10-247Background: Genetic analyses are often limited by the availability of appropriate molecular markers. Markers from neutrally evolving genomic regions may be particularly useful for inferring evolutionary histories because they escape the constraints of natural selection. For the majority of taxa however, obtaining such markers is challenging. Advances in genomics have the potential to alleviate the shortage of neutral markers. Here we present a method to develop numerous markers from putatively neutral regions of primate genomes. Results: We began with the available whole genome sequences of human, chimpanzee and macaque. Using computational methods, we identified a total of 280 potential amplicons from putatively neutral, non-coding, non-repetitive regions of these genomes. Subsequently we amplified, using experimental methods, many of these amplicons from diverse primate taxa, including a ringtailed lemur, which is distantly related to the genomic resources. Using a subset of 10 markers, we demonstrate the utility of the developed markers in phylogenetic and evolutionary rate analyses. Particularly, we uncovered substantial evolutionary rate variation among lineages, some of which are previously not reported. Conclusion: We successfully developed numerous markers from putatively neutral regions of primate genomes using a strategy combining computational and experimental methods. Applying these markers to phylogenetic and evolutionary rate variation analyses exemplifies the utility of these markers. Diverse ecological and evolutionary analyses will benefit from these markers. Importantly, methods similar to those presented here can be applied to other taxa in the near future
    corecore