95 research outputs found

    Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data

    Get PDF
    Next-generation sequencing is revolutionizing the identification of transcription factor binding sites throughout the human genome. However, the bioinformatics analysis of large datasets collected using chromatin immunoprecipitation and high-throughput sequencing is often a roadblock that impedes researchers in their attempts to gain biological insights from their experiments. We have developed integrated peak-calling and analysis software (Sole-Search) which is available through a user-friendly interface and (i) converts raw data into a format for visualization on a genome browser, (ii) outputs ranked peak locations using a statistically based method that overcomes the significant problem of false positives, (iii) identifies the gene nearest to each peak, (iv) classifies the location of each peak relative to gene structure, (v) provides information such as the number of binding sites per chromosome and per gene and (vi) allows the user to determine overlap between two different experiments. In addition, the program performs an analysis of amplified and deleted regions of the input genome. This software is web-based and automated, allowing easy and immediate access to all investigators. We demonstrate the utility of our software by collecting, analyzing and comparing ChIP-seq data for six different human transcription factors/cell line combinations

    Decoding the genome with an integrative analysis tool: Combinatorial CRM Decoder

    Get PDF
    The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called ‘trace code’, and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named ‘multi-functional CRM’, suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species

    Molecular interactions between HNF4a, FOXA2 and GABP identified at regulatory DNA elements through ChIP-sequencing

    Get PDF
    Gene expression is regulated by combinations of transcription factors, which can be mapped to regulatory elements on a genome-wide scale using ChIP experiments. In a previous ChIP-chip study of USF1 and USF2 we found evidence also of binding of GABP, FOXA2 and HNF4a within the enriched regions. Here, we have applied ChIP-seq for these transcription factors and identified 3064 peaks of enrichment for GABP, 7266 for FOXA2 and 18783 for HNF4a. Distal elements with USF2 signal was frequently bound also by HNF4a and FOXA2. GABP peaks were found at transcription start sites, whereas 94% of FOXA2 and 90% of HNF4a peaks were located at other positions. We developed a method to accurately define TFBS within peaks, and found the predicted sites to have an elevated conservation level compared to peak centers; however the majority of bindings were not evolutionary conserved. An interaction between HNF4a and GABP was seen at TSS, with one-third of the HNF4a positive promoters being bound also by GABP, and this interaction was verified by co-immunoprecipitations

    A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information

    Get PDF
    Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional k-mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar k-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments

    Efficient Double Fragmentation ChIP-seq Provides Nucleotide Resolution Protein-DNA Binding Profiles

    Get PDF
    Immunoprecipitated crosslinked protein-DNA fragments typically range in size from several hundred to several thousand base pairs, with a significant part of chromatin being much longer than the optimal length for next-generation sequencing (NGS) procedures. Because these larger fragments may be non-random and represent relevant biology that may otherwise be missed, but also because they represent a significant fraction of the immunoprecipitated material, we designed a double-fragmentation ChIP-seq procedure. After conventional crosslinking and immunoprecipitation, chromatin is de-crosslinked and sheared a second time to concentrate fragments in the optimal size range for NGS. Besides the benefits of increased chromatin yields, the procedure also eliminates a laborious size-selection step. We show that the double-fragmentation ChIP-seq approach allows for the generation of biologically relevant genome-wide protein-DNA binding profiles from sub-nanogram amounts of TCF7L2/TCF4, TBP and H3K4me3 immunoprecipitated material. Although optimized for the AB/SOLiD platform, the same approach may be applied to other platforms

    A potential role for endogenous proteins as sacrificial sunscreens and antioxidants in human tissues

    Get PDF
    AbstractExcessive ultraviolet radiation (UVR) exposure of the skin is associated with adverse clinical outcomes. Although both exogenous sunscreens and endogenous tissue components (including melanins and tryptophan-derived compounds) reduce UVR penetration, the role of endogenous proteins in absorbing environmental UV wavelengths is poorly defined. Having previously demonstrated that proteins which are rich in UVR-absorbing amino acid residues are readily degraded by broadband UVB-radiation (containing UVA, UVB and UVC wavelengths) here we hypothesised that UV chromophore (Cys, Trp and Tyr) content can predict the susceptibility of structural proteins in skin and the eye to damage by physiologically relevant doses (up to 15.4J/cm2) of solar UVR (95% UVA, 5% UVB). We show that: i) purified suspensions of UV-chromophore-rich fibronectin dimers, fibrillin microfibrils and β- and γ-lens crystallins undergo solar simulated radiation (SSR)-induced aggregation and/or decomposition and ii) exposure to identical doses of SSR has minimal effect on the size or ultrastructure of UV chromophore-poor tropoelastin, collagen I, collagen VI microfibrils and α-crystallin. If UV chromophore content is a factor in determining protein stability in vivo, we would expect that the tissue distribution of Cys, Trp and Tyr-rich proteins would correlate with regional UVR exposure. From bioinformatic analysis of 244 key structural proteins we identified several biochemically distinct, yet UV chromophore-rich, protein families. The majority of these putative UV-absorbing proteins (including the late cornified envelope proteins, keratin associated proteins, elastic fibre-associated components and β- and γ-crystallins) are localised and/or particularly abundant in tissues that are exposed to the highest doses of environmental UVR, specifically the stratum corneum, hair, papillary dermis and lens. We therefore propose that UV chromophore-rich proteins are localised in regions of high UVR exposure as a consequence of an evolutionary pressure to express sacrificial protein sunscreens which reduce UVR penetration and hence mitigate tissue damage

    Joint Binding of OTX2 and MYC in Promotor Regions Is Associated with High Gene Expression in Medulloblastoma

    Get PDF
    Both OTX2 and MYC are important oncogenes in medulloblastoma, the most common malignant brain tumor in childhood. Much is known about MYC binding to promoter regions, but OTX2 binding is hardly investigated. We used ChIP-on-chip data to analyze the binding patterns of both transcription factors in D425 medulloblastoma cells. When combining the data for all promoter regions in the genome, OTX2 binding showed a remarkable bi-modal distribution pattern with peaks around −250 bp upstream and +650 bp downstream of the transcription start sites (TSSs). Indeed, 40.2% of all OTX2-bound TSSs had more than one significant OTX2-binding peak. This OTX2-binding pattern was very different from the TSS-centered single peak binding pattern observed for MYC and other known transcription factors. However, in individual promoter regions, OTX2 and MYC have a strong tendency to bind in proximity of each other. OTX2-binding sequences are depleted near TSSs in the genome, providing an explanation for the observed bi-modal distribution of OTX2 binding. This contrasts to the enrichment of E-box sequences at TSSs. Both OTX2 and MYC binding independently correlated with higher gene expression. Interestingly, genes of promoter regions with multiple OTX2 binding as well as MYC binding showed the highest expression levels in D425 cells and in primary medulloblastomas. Genes within this class of promoter regions were enriched for medulloblastoma and stem cell specific genes. Our data suggest an important functional interaction between OTX2 and MYC in regulating gene expression in medulloblastoma

    Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing

    Get PDF
    Foxa2 (HNF3β) is a one of three, closely related transcription factors that are critical to the development and function of the mouse liver. We have used chromatin immunoprecipitation and massively parallel Illumina 1G sequencing (ChIP–Seq) to create a genome-wide profile of in vivo Foxa2-binding sites in the adult liver. More than 65% of the ∼11.5 k genomic sites associated with Foxa2 binding, mapped to extended gene regions of annotated genes, while more than 30% of intragenic sites were located within first introns. 20.5% of all sites were further than 50 kb from any annotated gene, suggesting an association with novel gene regions. QPCR analysis demonstrated a strong positive correlation between peak height and fold enrichment for Foxa2-binding sites. We measured the relationship between Foxa2 and liver gene expression by overlapping Foxa2-binding sites with a SAGE transcriptome profile, and found that 43.5% of genes expressed in the liver were also associated with Foxa2 binding. We also identified potential Foxa2-interacting transcription factors whose motifs were enriched near Foxa2-binding sites. Our comprehensive results for in vivo Foxa2-binding sites in the mouse liver will contribute to resolving transcriptional regulatory networks that are important for adult liver function

    Study of FoxA Pioneer Factor at Silent Genes Reveals Rfx-Repressed Enhancer at Cdx2 and a Potential Indicator of Esophageal Adenocarcinoma Development

    Get PDF
    Understanding how silent genes can be competent for activation provides insight into development as well as cellular reprogramming and pathogenesis. We performed genomic location analysis of the pioneer transcription factor FoxA in the adult mouse liver and found that about one-third of the FoxA bound sites are near silent genes, including genes without detectable RNA polymerase II. Virtually all of the FoxA-bound silent sites are within conserved sequences, suggesting possible function. Such sites are enriched in motifs for transcriptional repressors, including for Rfx1 and type II nuclear hormone receptors. We found one such target site at a cryptic “shadow” enhancer 7 kilobases (kb) downstream of the Cdx2 gene, where Rfx1 restricts transcriptional activation by FoxA. The Cdx2 shadow enhancer exhibits a subset of regulatory properties of the upstream Cdx2 promoter region. While Cdx2 is ectopically induced in the early metaplastic condition of Barrett's esophagus, its expression is not necessarily present in progressive Barrett's with dysplasia or adenocarcinoma. By contrast, we find that Rfx1 expression in the esophageal epithelium becomes gradually extinguished during progression to cancer, i.e, expression of Rfx1 decreased markedly in dysplasia and adenocarcinoma. We propose that this decreased expression of Rfx1 could be an indicator of progression from Barrett's esophagus to adenocarcinoma and that similar analyses of other transcription factors bound to silent genes can reveal unanticipated regulatory insights into oncogenic progression and cellular reprogramming

    Integrated Expression Profiling and Genome-Wide Analysis of ChREBP Targets Reveals the Dual Role for ChREBP in Glucose-Regulated Gene Expression

    Get PDF
    The carbohydrate response element binding protein (ChREBP), a basic helix-loop-helix/leucine zipper transcription factor, plays a critical role in the control of lipogenesis in the liver. To identify the direct targets of ChREBP on a genome-wide scale and provide more insight into the mechanism by which ChREBP regulates glucose-responsive gene expression, we performed chromatin immunoprecipitation-sequencing and gene expression analysis. We identified 1153 ChREBP binding sites and 783 target genes using the chromatin from HepG2, a human hepatocellular carcinoma cell line. A motif search revealed a refined consensus sequence (CABGTG-nnCnG-nGnSTG) to better represent critical elements of a functional ChREBP binding sequence. Gene ontology analysis shows that ChREBP target genes are particularly associated with lipid, fatty acid and steroid metabolism. In addition, other functional gene clusters related to transport, development and cell motility are significantly enriched. Gene set enrichment analysis reveals that ChREBP target genes are highly correlated with genes regulated by high glucose, providing a functional relevance to the genome-wide binding study. Furthermore, we have demonstrated that ChREBP may function as a transcriptional repressor as well as an activator
    corecore