436 research outputs found

    JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

    Get PDF
    JASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release

    Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome.

    Get PDF
    yesEpigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo- Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics

    Identification of TNF-alpha-Responsive Promoters and Enhancers in the Intestinal Epithelial Cell Model Caco-2

    Get PDF
    The Caco-2 cell line is one of the most important in vitro models for enterocytes, and is used to study drug absorption and disease, including inflammatory bowel disease and cancer. In order to use the model optimally, it is necessary to map its functional entities. In this study, we have generated genome-wide maps of active transcription start sites (TSSs), and active enhancers in Caco-2 cells with or without tumour necrosis factor (TNF)-α stimulation to mimic an inflammatory state. We found 520 promoters that significantly changed their usage level upon TNF-α stimulation; of these, 52% are not annotated. A subset of these has the potential to confer change in protein function due to protein domain exclusion. Moreover, we locate 890 transcribed enhancer candidates, where ∼50% are changing in usage after TNF-α stimulation. These enhancers share motif enrichments with similarly responding gene promoters. As a case example, we characterize an enhancer regulating the laminin-5 γ2-chain (LAMC2) gene by nuclear factor (NF)-κB binding. This report is the first to present comprehensive TSS and enhancer maps over Caco-2 cells, and highlights many novel inflammation-specific promoters and enhancers

    Transcriptional and epigenomic profiling identifies YAP signaling as a key regulator of intestinal epithelium maturation

    Full text link
    During intestinal organogenesis, equipotent epithelial progenitors mature into phenotypically distinct stem cells that are responsible for lifelong maintenance of the tissue. While the morphological changes associated with the transition are well characterized, the molecular mechanisms underpinning the maturation process are not fully understood. Here, we leverage intestinal organoid cultures to profile transcriptional, chromatin accessibility, DNA methylation, and three-dimensional (3D) chromatin conformation landscapes in fetal and adult epithelial cells. We observed prominent differences in gene expression and enhancer activity, which are accompanied by local changes in 3D organization, DNA accessibility, and methylation between the two cellular states. Using integrative analyses, we identified sustained Yes-Associated Protein (YAP) transcriptional activity as a major gatekeeper of the immature fetal state. We found the YAP-associated transcriptional network to be regulated at various levels of chromatin organization and likely to be coordinated by changes in extracellular matrix composition. Together, our work highlights the value of unbiased profiling of regulatory landscapes for the identification of key mechanisms underlying tissue maturation

    Limitations and potentials of current motif discovery algorithms

    Get PDF
    Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them

    Transcription factor site dependencies in human, mouse and rat genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is known that transcription factors frequently act together to regulate gene expression in eukaryotes. In this paper we describe a computational analysis of transcription factor site dependencies in human, mouse and rat genomes.</p> <p>Results</p> <p>Our approach for quantifying tendencies of transcription factor binding sites to co-occur is based on a binding site scoring function which incorporates dependencies between positions, the use of information about the structural class of each transcription factor (major/minor groove binder), and also considered the possible implications of varying GC content of the sequences. Significant tendencies (dependencies) have been detected by non-parametric statistical methodology (permutation tests). Evaluation of obtained results has been performed in several ways: reports from literature (many of the significant dependencies between transcription factors have previously been confirmed experimentally); dependencies between transcription factors are not biased due to similarities in their DNA-binding sites; the number of dependent transcription factors that belong to the same functional and structural class is significantly higher than would be expected by chance; supporting evidence from GO clustering of targeting genes. Based on dependencies between two transcription factor binding sites (second-order dependencies), it is possible to construct higher-order dependencies (networks). Moreover results about transcription factor binding sites dependencies can be used for prediction of groups of dependent transcription factors on a given promoter sequence. Our results, as well as a scanning tool for predicting groups of dependent transcription factors binding sites are available on the Internet.</p> <p>Conclusion</p> <p>We show that the computational analysis of transcription factor site dependencies is a valuable complement to experimental approaches for discovering transcription regulatory interactions and networks. Scanning promoter sequences with dependent groups of transcription factor binding sites improve the quality of transcription factor predictions.</p

    WordCluster: detecting clusters of DNA words and genomic elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many <it>k-</it>mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds.</p> <p>Results</p> <p>We introduce here an algorithm to detect clusters of DNA words (<it>k-</it>mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used <it>WordCluster </it>to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome.</p> <p>Conclusions</p> <p><it>WordCluster </it>seems to predict biological meaningful clusters of DNA words (<it>k-</it>mers) and genomic entities. The implementation of the method into a web server is available at <url>http://bioinfo2.ugr.es/wordCluster/wordCluster.php</url> including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.</p

    A computational evaluation of over-representation of regulatory motifs in the promoter regions of differentially expressed genes

    Get PDF
    BACKGROUND: Observed co-expression of a group of genes is frequently attributed to co-regulation by shared transcription factors. This assumption has led to the hypothesis that promoters of co-expressed genes should share common regulatory motifs, which forms the basis for numerous computational tools that search for these motifs. While frequently explored for yeast, the validity of the underlying hypothesis has not been assessed systematically in mammals. This demonstrates the need for a systematic and quantitative evaluation to what degree co-expressed genes share over-represented motifs for mammals. RESULTS: We identified 33 experiments for human and mouse in the ArrayExpress Database where transcription factors were manipulated and which exhibited a significant number of differentially expressed genes. We checked for over-representation of transcription factor binding sites in up- or down-regulated genes using the over-representation analysis tool oPOSSUM. In 25 out of 33 experiments, this procedure identified the binding matrices of the affected transcription factors. We also carried out de novo prediction of regulatory motifs shared by differentially expressed genes. Again, the detected motifs shared significant similarity with the matrices of the affected transcription factors. CONCLUSIONS: Our results support the claim that functional regulatory motifs are over-represented in sets of differentially expressed genes and that they can be detected with computational methods

    MDM2 Promoter SNP344T>A (rs1196333) Status Does Not Affect Cancer Risk

    Get PDF
    The MDM2 proto-oncogene plays a key role in central cellular processes like growth control and apoptosis, and the gene locus is frequently amplified in sarcomas. Two polymorphisms located in the MDM2 promoter P2 have been shown to affect cancer risk. One of these polymorphisms (SNP309T>G; rs2279744) facilitates Sp1 transcription factor binding to the promoter and is associated with increased cancer risk. In contrast, SNP285G>C (rs117039649), located 24 bp upstream of rs2279744, and in complete linkage disequilibrium with the SNP309G allele, reduces Sp1 recruitment and lowers cancer risk. Thus, fine tuning of MDM2 expression has proven to be of significant importance with respect to tumorigenesis. We assessed the potential functional effects of a third MDM2 promoter P2 polymorphism (SNP344T>A; rs1196333) located on the SNP309T allele. While in silico analyses indicated SNP344A to modulate TFAP2A, SPIB and AP1 transcription factor binding, we found no effect of SNP344 status on MDM2 expression levels. Assessing the frequency of SNP344A in healthy Caucasians (n = 2,954) and patients suffering from ovarian (n = 1,927), breast (n = 1,271), endometrial (n = 895) or prostatic cancer (n = 641), we detected no significant difference in the distribution of this polymorphism between any of these cancer forms and healthy controls (6.1% in healthy controls, and 4.9%, 5.0%, 5.4% and 7.2% in the cancer groups, respectively). In conclusion, our findings provide no evidence indicating that SNP344A may affect MDM2 transcription or cancer risk
    corecore