90 research outputs found

    motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences

    Full text link
    Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials

    Genomic analysis of the relationship between gene expression variation and DNA polymorphism in Drosophila simulans

    Get PDF
    BackgroundUnderstanding how DNA sequence polymorphism relates to variation in gene expression is essential to connecting genotypic differences with phenotypic differences among individuals. Addressing this question requires linking population genomic data with gene expression variation.ResultsUsing whole genome expression data and recent light shotgun genome sequencing of six Drosophila simulans genotypes, we assessed the relationship between expression variation in males and females and nucleotide polymorphism across thousands of loci. By examining sequence polymorphism in gene features, such as untranslated regions and introns, we find that genes showing greater variation in gene expression between genotypes also have higher levels of sequence polymorphism in many gene features. Accordingly, X-linked genes, which have lower sequence polymorphism levels than autosomal genes, also show less expression variation than autosomal genes. We also find that sex-specifically expressed genes show higher local levels of polymorphism and divergence than both sex-biased and unbiased genes, and that they appear to have simpler regulatory regions.ConclusionThe gene-feature-based analyses and the X-to-autosome comparisons suggest that sequence polymorphism in cis-acting elements is an important determinant of expression variation. However, this relationship varies among the different categories of sex-biased expression, and trans factors might contribute more to male-specific gene expression than cis effects. Our analysis of sex-specific gene expression also shows that female-specific genes have been overlooked in analyses that only point to male-biased genes as having unusual patterns of evolution and that studies of sexually dimorphic traits need to recognize that the relationship between genetic and expression variation at these traits is different from the genome as a whole

    Adaptive Gene Expression Divergence Inferred from Population Genomics

    Get PDF
    Detailed studies of individual genes have shown that gene expression divergence often results from adaptive evolution of regulatory sequence. Genome-wide analyses, however, have yet to unite patterns of gene expression with polymorphism and divergence to infer population genetic mechanisms underlying expression evolution. Here, we combined genomic expression data--analyzed in a phylogenetic context--with whole genome light-shotgun sequence data from six Drosophila simulans lines and reference sequences from D. melanogaster and D. yakuba. These data allowed us to use molecular population genetics to test for neutral versus adaptive gene expression divergence on a genomic scale. We identified recent and recurrent adaptive evolution along the D. simulans lineage by contrasting sequence polymorphism within D. simulans to divergence from D. melanogaster and D. yakuba. Genes that evolved higher levels of expression in D. simulans have experienced adaptive evolution of the associated 3' flanking and amino acid sequence. Concomitantly, these genes are also decelerating in their rates of protein evolution, which is in agreement with the finding that highly expressed genes evolve slowly. Interestingly, adaptive evolution in 5' cis-regulatory regions did not correspond strongly with expression evolution. Our results provide a genomic view of the intimate link between selection acting on a phenotype and associated genic evolution

    A High-Resolution Map of Human Evolutionary Constraint Using 29 Mammals

    Get PDF
    The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.National Human Genome Research Institute (U.S.)National Institute of General Medical Sciences (U.S.) (Grant number GM82901)National Science Foundation (U.S.). Postdoctural Fellowship (Award 0905968)National Science Foundation (U.S.). Career (0644282)National Institutes of Health (U.S.) (R01-HG004037)Alfred P. Sloan Foundation.Austrian Science Fund. Erwin Schrodinger Fellowshi

    Chromatin remodelling complex dosage modulates transcription factor function in heart development

    Get PDF
    Dominant mutations in cardiac transcription factor genes cause human inherited congenital heart defects (CHDs); however, their molecular basis is not understood. Interactions between transcription factors and the Brg1/Brm-associated factor (BAF) chromatin remodelling complex suggest potential mechanisms; however, the role of BAF complexes in cardiogenesis is not known. In this study, we show that dosage of Brg1 is critical for mouse and zebrafish cardiogenesis. Disrupting the balance between Brg1 and disease-causing cardiac transcription factors, including Tbx5, Tbx20 and Nkx2-5, causes severe cardiac anomalies, revealing an essential allelic balance between Brg1 and these cardiac transcription factor genes. This suggests that the relative levels of transcription factors and BAF complexes are important for heart development, which is supported by reduced occupancy of Brg1 at cardiac gene promoters in Tbx5 haploinsufficient hearts. Our results reveal complex dosage-sensitive interdependence between transcription factors and BAF complexes, providing a potential mechanism underlying transcription factor haploinsufficiency, with implications for multigenic inheritance of CHDs

    The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage

    Get PDF
    BackgroundWe describe the genome of the western painted turtle, Chrysemys picta bellii, one of the most widespread, abundant, and well-studied turtles. We place the genome into a comparative evolutionary context, and focus on genomic features associated with tooth loss, immune function, longevity, sex differentiation and determination, and the species' physiological capacities to withstand extreme anoxia and tissue freezing.ResultsOur phylogenetic analyses confirm that turtles are the sister group to living archosaurs, and demonstrate an extraordinarily slow rate of sequence evolution in the painted turtle. The ability of the painted turtle to withstand complete anoxia and partial freezing appears to be associated with common vertebrate gene networks, and we identify candidate genes for future functional analyses. Tooth loss shares a common pattern of pseudogenization and degradation of tooth-specific genes with birds, although the rate of accumulation of mutations is much slower in the painted turtle. Genes associated with sex differentiation generally reflect phylogeny rather than convergence in sex determination functionality. Among gene families that demonstrate exceptional expansions or show signatures of strong natural selection, immune function and musculoskeletal patterning genes are consistently over-represented.ConclusionsOur comparative genomic analyses indicate that common vertebrate regulatory networks, some of which have analogs in human diseases, are often involved in the western painted turtle's extraordinary physiological capacities. As these regulatory pathways are analyzed at the functional level, the painted turtle may offer important insights into the management of a number of human health disorders

    The 2006 NESCent Phyloinformatics Hackathon: A Field Report

    Get PDF
    In December, 2006, a group of 26 software developers from some of the most widely used life science programming toolkits and phylogenetic software projects converged on Durham, North Carolina, for a Phyloinformatics Hackathon, an intense five-day collaborative software coding event sponsored by the National Evolutionary Synthesis Center (NESCent). The goal was to help researchers to integrate multiple phylogenetic software tools into automated workflows. Participants addressed deficiencies in interoperability between programs by implementing “glue code” and improving support for phylogenetic data exchange standards (particularly NEXUS) across the toolkits. The work was guided by use-cases compiled in advance by both developers and users, and the code was documented as it was developed. The resulting software is freely available for both users and developers through incorporation into the distributions of several widely-used open-source toolkits. We explain the motivation for the hackathon, how it was organized, and discuss some of the outcomes and lessons learned. We conclude that hackathons are an effective mode of solving problems in software interoperability and usability, and are underutilized in scientific software development
    corecore