53 research outputs found
A comparative genomics approach to identifying the plasticity transcriptome
BACKGROUND: Neuronal activity regulates gene expression to control learning and memory, homeostasis of neuronal function, and pathological disease states such as epilepsy. A great deal of experimental evidence supports the involvement of two particular transcription factors in shaping the genomic response to neuronal activity and mediating plasticity: CREB and zif268 (egr-1, krox24, NGFI-A). The gene targets of these two transcription factors are of considerable interest, since they may help develop hypotheses about how neural activity is coupled to changes in neural function. RESULTS: We have developed a computational approach for identifying binding sites for these transcription factors within the promoter regions of annotated genes in the mouse, rat, and human genomes. By combining a robust search algorithm to identify discrete binding sites, a comparison of targets across species, and an analysis of binding site locations within promoter regions, we have defined a group of candidate genes that are strong CREB- or zif268 targets and are thus regulated by neural activity. Our analysis revealed that CREB and zif268 share a disproportionate number of targets in common and that these common targets are dominated by transcription factors. CONCLUSION: These observations may enable a more detailed understanding of the regulatory networks that are induced by neural activity and contribute to the plasticity transcriptome. The target genes identified in this study will be a valuable resource for investigators who hope to define the functions of specific genes that underlie activity-dependent changes in neuronal properties
Core and region-enriched networks of behaviorally regulated genes and the singing genome
Songbirds represent an important model organism for elucidating molecular mechanisms that link genes with complex behaviors, in part because they have discrete vocal learning circuits that have parallels with those that mediate human speech. We found that ~10% of the genes in the avian genome were regulated by singing, and we found a striking regional diversity of both basal and singing-induced programs in the four key song nuclei of the zebra finch, a vocal learning songbird. The region-enriched patterns were a result of distinct combinations of region-enriched transcription factors (TFs), their binding motifs, and presinging acetylation of histone 3 at lysine 27 (H3K27ac) enhancer activity in the regulatory regions of the associated genes. RNA interference manipulations validated the role of the calcium-response transcription factor (CaRF) in regulating genes preferentially expressed in specific song nuclei in response to singing. Thus, differential combinatorial binding of a small group of activity-regulated TFs and predefined epigenetic enhancer activity influences the anatomical diversity of behaviorally regulated gene networks
A comparative genomics multitool for scientific discovery and conservation
A whole-genome alignment of 240 phylogenetically diverse species of eutherian mammal-including 131 previously uncharacterized species-from the Zoonomia Project provides data that support biological discovery, medical research and conservation. The Zoonomia Project is investigating the genomics of shared and specialized traits in eutherian mammals. Here we provide genome assemblies for 131 species, of which all but 9 are previously uncharacterized, and describe a whole-genome alignment of 240 species of considerable phylogenetic diversity, comprising representatives from more than 80% of mammalian families. We find that regions of reduced genetic diversity are more abundant in species at a high risk of extinction, discern signals of evolutionary selection at high resolution and provide insights from individual reference genomes. By prioritizing phylogenetic diversity and making data available quickly and without restriction, the Zoonomia Project aims to support biological discovery, medical research and the conservation of biodiversity.Peer reviewe
Genome-Wide Identification of Calcium-Response Factor (CaRF) Binding Sites Predicts a Role in Regulation of Neuronal Signaling Pathways
Calcium-Response Factor (CaRF) was first identified as a transcription factor based on its affinity for a neuronal-selective calcium-response element (CaRE1) in the gene encoding Brain-Derived Neurotrophic Factor (BDNF). However, because CaRF shares no homology with other transcription factors, its properties and gene targets have remained unknown. Here we show that the DNA binding domain of CaRF has been highly conserved across evolution and that CaRF binds DNA directly in a sequence-specific manner in the absence of other eukaryotic cofactors. Using a binding site selection screen we identify a high-affinity consensus CaRF response element (cCaRE) that shares significant homology with the CaRE1 element of Bdnf. In a genome-wide chromatin immunoprecipitation analysis (ChIP-Seq), we identified 176 sites of CaRF-specific binding (peaks) in neuronal genomic DNA. 128 of these peaks are within 10kB of an annotated gene, and 60 are within 1kB of an annotated transcriptional start site. At least 138 of the CaRF peaks contain a common 10-bp motif with strong statistical similarity to the cCaRE, and we provide evidence predicting that CaRF can bind independently to at least 64.5% of these motifs in vitro. Analysis of this set of putative CaRF targets suggests the enrichment of genes that regulate intracellular signaling cascades. Finally we demonstrate that expression of a subset of these target genes is altered in the cortex of Carf knockout (KO) mice. Together these data strongly support the characterization of CaRF as a unique transcription factor and provide the first insight into the program of CaRF-regulated transcription in neurons
Relating enhancer genetic variation across mammals to complex phenotypes using machine learning
[INTRODUCTION] Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons.[RATIONALE] Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types.[RESULTS] To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated.[CONCLUSION] TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution.This work used the Extreme Science and Engineering Discovery Environment (XSEDE), through the Pittsburgh Supercomputing Center Bridges and Bridges-2 Compute Clusters, which was supported by National Science Foundation grants TG-BIO200055 and ACI-1548562 (131). Portions of this research were conducted on Lehigh University’s Research Computing infrastructure, which is partially supported by NSF award 2019035.Funding was provided by a Carnegie Mellon University Computational Biology Department Lane Fellowship (I.M.K.); NIH NIDA DP1DA046585 grant (D.E.S., M.E.W., X.Z., A.R.B., and A.R.P.); NSF grant 2046550 (I.M.K. and A.R.P.); an Alfred P. Sloan Foundation Research Fellowship (I.M.K., M.E.W., and A.R.P.); the Carnegie Mellon University Computational Biology Department (C.S.); NSF Graduate Research Fellowship Program grant DGE1252522 (A.J.L.); NSF Graduate Research Fellowship Program grant DGE1745016 (A.J.L.); a Carnegie Mellon University Summer Undergraduate Research Fellowship (D.E.S.); NIH NIDA Fellowship grant F30DA053020 (B.N.P.); NIH UG3-MH-120094 (K.P.); NSF grant 2022046 (D.P.G.); NIH NHGRI R01HG008742 grant (E.K.K.); and a Swedish Research Council Distinguished Professor Award (K.L.-T.).Peer reviewe
Comparative genomics reveals insights into avian genome evolution and adaptation
Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits
Leveraging base-pair mammalian constraint to understand genetic variation and human disease
[INTRODUCTION] Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge.[RATIONALE] We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx).[RESULTS] Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency < 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P < 2.2 × 10−308). Pathogenic ClinVar variants are more constrained than benign variants (P < 2.2 × 10−16).
The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base.
Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes.[CONCLUSION] Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease.This work was funded by the Swedish Research Council and Knut and Alice Wallenberg Foundation, Swedish Cancer Society, Swedish Childhood Cancer Fund, National Institute of Mental Health (NIMH) U01MH116438, Gladstone Institutes, National Institute on Drug Abuse (NIDA) DP1DA04658501, NIDA F30DA053020, University College Dublin (UCD) Ad Astra Fellowship, and National Human Genome Research Institute (NHGRI) R01HG008742 and U41HG002371. S.G. was supported by National Institutes of Health (NIH) grants R00 HG010160 and R35 GM147789. Y.L. was supported by NIH U01 HG011720. Additional support was provided by the Australian National Health and Medical Research Council (1113400, 1173790, and 1177268). L.M.H. was supported by NIH grants MH118278, MH124839, and ES033630. P.F.S. was supported by the Swedish Research Council (Vetenskapsrådet, award D0886501). This study makes use of data from the UK Biobank (project ID 12505).Peer reviewe
Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease
Alzheimer’s disease (AD) is a severe1 age-related neurodegenerative disorder characterized by accumulation of amyloid-β (Aβ) plaques and neurofibrillary tangles, synaptic and neuronal loss, and cognitive decline. Several genes have been implicated in AD, but chromatin state alterations during neurodegeneration remain uncharacterized. Here, we profile transcriptional and chromatin state dynamics across early and late pathology in the hippocampus of an inducible mouse model of AD-like neurodegeneration. We find a coordinated downregulation of synaptic plasticity genes and regulatory regions, and upregulation of immune response genes and regulatory regions, which are targeted by factors that belong to the ETS family of transcriptional regulators, including PU.1. Human regions orthologous to increasing-level enhancers show immune cell-specific enhancer signatures as well as immune cell expression quantitative trait loci (eQTL), while decreasing-level enhancer orthologs show fetal-brain-specific enhancer activity. Notably, AD-associated genetic variants are specifically enriched in increasing-level enhancer orthologs implicating immune processes in AD predisposition. Indeed, increasing enhancers overlap known AD loci lacking protein-altering variants and implicate additional loci that do not reach genome-wide significance. Our results reveal new insights into the mechanisms of neurodegeneration and establish the mouse as a useful model for functional studies of AD regulatory regions
- …