15 research outputs found

    Identification of cis-regulatory sequence variations in individual genome sequences

    Get PDF
    Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing

    JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

    No full text
    International audienceJASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release

    JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

    Get PDF
    International audienceJASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release

    A Conserved Noncoding Locus Regulates Random Monoallelic Xist Expression across a Topological Boundary

    Get PDF
    cis-Regulatory communication is crucial in mammalian development and is thought to be restricted by the spatial partitioning of the genome in topologically associating domains (TADs). Here, we discovered that the Xist locus is regulated by sequences in the neighboring TAD. In particular, the promoter of the noncoding RNA Linx (LinxP) acts as a long-range silencer and influences the choice of X chromosome to be inactivated. This is independent of Linx transcription and independent of any effect on Tsix, the antisense regulator of Xist that shares the same TAD as Linx. Unlike Tsix, LinxP is well conserved across mammals, suggesting an ancestral mechanism for random monoallelic Xist regulation. When introduced in the same TAD as Xist, LinxP switches from a silencer to an enhancer. Our study uncovers an unsuspected regulatory axis for X chromosome inactivation and a class of cis-regulatory effects that may exploit TAD partitioning to modulate developmental decisions.Galupa et al. uncover elements important for Xist regulation in its neighboring TAD and reveal that these elements can influence gene regulation both within and between topological domains. These findings, in a context where dynamic, developmental expression is necessary, challenge current models for TAD-based gene-regulatory landscapes

    Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets

    No full text
    Background: The global effort to annotate the non-coding portion of the human genome relies heavily on chromatin immunoprecipitation data generated with high-throughput DNA sequencing (ChIP-seq). ChIP-seq is generally successful in detailing the segments of the genome bound by the immunoprecipitated transcription factor (TF), however almost all datasets contain genomic regions devoid of the canonical motif for the TF. It remains to be determined if these regions are related to the immunoprecipitated TF or whether, despite the use of controls, there is a portion of peaks that can be attributed to other causes. Results: Analyses across hundreds of ChIP-seq datasets generated for sequence-specific DNA binding TFs reveal a small set of TF binding profiles for which predicted TF binding site motifs are repeatedly observed to be significantly enriched. Grouping related binding profiles, the set includes: CTCF-like, ETS-like, JUN-like, and THAP11 profiles. These frequently enriched profiles are termed ‘zingers’ to highlight their unanticipated enrichment in datasets for which they were not the targeted TF, and their potential impact on the interpretation and analysis of TF ChIP-seq data. Peaks with zinger motifs and lacking the ChIPped TF’s motif are observed to compose up to 45% of a ChIP-seq dataset. There is substantial overlap of zinger motif containing regions between diverse TF datasets, suggesting a mechanism that is not TF-specific for the recovery of these regions. Conclusions: Based on the zinger regions proximity to cohesin-bound segments, a loading station model is proposed. Further study of zingers will advance understanding of gene regulation.Medical Genetics, Department ofMedicine, Faculty ofMolecular Medicine and Therapeutics, Centre forScience, Faculty ofReviewedFacult

    Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment

    No full text
    Background: Chromatin immunoprecipitation (ChIP) coupled to high-throughput sequencing (ChIP-Seq) techniques can reveal DNA regions bound by transcription factors (TF). Analysis of the ChIP-Seq regions is now a central component in gene regulation studies. The need remains strong for methods to improve the interpretation of ChIP-Seq data and the study of specific TF binding sites (TFBS). Results We introduce a set of methods to improve the interpretation of ChIP-Seq data, including the inference of mediating TFs based on TFBS motif over-representation analysis and the subsequent study of spatial distribution of TFBSs. TFBS over-representation analysis applied to ChIP-Seq data is used to detect which TFBSs arise more frequently than expected by chance. Visualization of over-representation analysis results with new composition-bias plots reveals systematic bias in over-representation scores. We introduce the BiasAway background generating software to resolve the problem. A heuristic procedure based on topological motif enrichment relative to the ChIP-Seq peaks’ local maximums highlights peaks likely to be directly bound by a TF of interest. The results suggest that on average two-thirds of a ChIP-Seq dataset’s peaks are bound by the ChIP’d TF; the origin of the remaining peaks remaining undetermined. Additional visualization methods allow for the study of both inter-TFBS spatial relationships and motif-flanking sequence properties, as demonstrated in case studies for TBP and ZNF143/THAP11. Conclusions Topological properties of TFBS within ChIP-Seq datasets can be harnessed to better interpret regulatory sequences. Using GC content corrected TFBS over-representation analysis, combined with visualization techniques and analysis of the topological distribution of TFBS, we can distinguish peaks likely to be directly bound by a TF. The new methods will empower researchers for exploration of gene regulation and TF binding.Medical Genetics, Department ofMedicine, Faculty ofMolecular Medicine and Therapeutics, Centre forScience, Faculty ofNon UBCReviewedFacult

    The SIN3A histone deacetylase complex is required for a complete transcriptional response to hypoxia

    No full text
    Cells adapt to environmental changes, including fluctuations in oxygen levels, through the induction of specific gene expression programs. To identify genes regulated by hypoxia at the transcriptional level, we pulse-labeled HUVEC cells with 4-thiouridine and sequenced nascent transcripts. Then, we searched genome-wide binding profiles from the ENCODE project for factors that correlated with changes in transcription and identified binding of several components of the Sin3A co-repressor complex, including SIN3A, SAP30 and HDAC1/2, proximal to genes repressed by hypoxia. SIN3A interference revealed that it participates in the downregulation of 75% of the hypoxia-repressed genes in endothelial cells. Unexpectedly, it also blunted the induction of 47% of the upregulated genes, suggesting a role for this corepressor in gene induction. In agreement, ChIP-seq experiments showed that SIN3A preferentially localizes to the promoter region of actively transcribed genes and that SIN3A signal was enriched in hypoxia-repressed genes, prior exposure to the stimulus. Importantly, SINA3 occupancy was not altered by hypoxia in spite of changes in H3K27ac signal. In summary, our results reveal a prominent role for SIN3A in the transcriptional response to hypoxia and suggest a model where modulation of the associated histone deacetylase activity, rather than its recruitment, determines the transcriptional output.Ministerio de Ciencia e Innovacion (Spanish Ministry of Science and Innovation, MICINN) [SAF2011 24225 to L.d.P., SAF2014–53819-R to L.d.P., B.J.]; Canadian Institutes of Health Research (CIHR) [MOP-82875 to W.W.).]; Natural Sciences and Engineering Research Council of Canada (NSERC) [RGPIN355532–10 to W.W.W.]; National Institutes of Health [1R01GM084875 to W.W.W.]; CIHR Fellowship (to R.W.H.); Michael Smith Foundation for Health Research Fellowship (to R.W.H.); Caja Madrid Foundation for Visiting Professor Fellowship (to L.d.P). Funding for open access charge: Spanish Ministry of Science and Innovation, MICINN, [SAF2014-53819-R].Peer reviewe

    JASPAR 2014: An extensively expanded and updated open-access database of transcription factor binding profiles

    Get PDF
    JASPAR (http://jaspar.genereg.net) is the largest open-access database of matrix-based nucleotide profiles describing the binding preference of transcription factors from multiple species. The fifth major release greatly expands the heart of JASPAR—the JASPAR CORE subcollection, which contains curated, non-redundant profiles—with 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. In addition, the web interface has been enhanced with advanced capabilities in browsing, searching and subsetting. Finally, the new JASPAR release is accompanied by a new BioPython package, a new R tool package and a new R/Bioconductor data package to facilitate access for both manual and automated methods
    corecore