24 research outputs found

    A Mixture Modeling Framework for Differential Analysis of High-Throughput Data

    Get PDF
    The inventions of microarray and next generation sequencing technologies have revolutionized research in genomics; platforms have led to massive amount of data in gene expression, methylation, and protein-DNA interactions. A common theme among a number of biological problems using high-throughput technologies is differential analysis. Despite the common theme, different data types have their own unique features, creating a “moving target” scenario. As such, methods specifically designed for one data type may not lead to satisfactory results when applied to another data type. To meet this challenge so that not only currently existing data types but also data from future problems, platforms, or experiments can be analyzed, we propose a mixture modeling framework that is flexible enough to automatically adapt to any moving target. More specifically, the approach considers several classes of mixture models and essentially provides a model-based procedure whose model is adaptive to the particular data being analyzed. We demonstrate the utility of the methodology by applying it to three types of real data: gene expression, methylation, and ChIP-seq. We also carried out simulations to gauge the performance and showed that the approach can be more efficient than any individual model without inflating type I error

    Practical guidelines for the comprehensive analysis of ChIP-seq data.

    Get PDF
    Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections

    Integrated analysis identifies a class of androgen-responsive genes regulated by short combinatorial long-range mechanism facilitated by CTCF

    Get PDF
    Recently, much attention has been given to elucidate how long-range gene regulation comes into play and how histone modifications and distal transcription factor binding contribute toward this mechanism. Androgen receptor (AR), a key regulator of prostate cancer, has been shown to regulate its target genes via distal enhancers, leading to the hypothesis of global long-range gene regulation. However, despite numerous flows of newly generated data, the precise mechanism with respect to AR-mediated long-range gene regulation is still largely unknown. In this study, we carried out an integrated analysis combining several types of high-throughput data, including genome-wide distribution data of H3K4 di-methylation (H3K4me2), CCCTC binding factor (CTCF), AR and FoxA1 cistrome data as well as androgen-regulated gene expression data. We found that a subset of androgen-responsive genes was significantly enriched near AR/H3K4me2 overlapping regions and FoxA1 binding sites within the same CTCF block. Importantly, genes in this class were enriched in cancer-related pathways and were downregulated in clinical metastatic versus localized prostate cancer. Our results suggest a relatively short combinatorial long-range regulation mechanism facilitated by CTCF blocking. Under such a mechanism, H3K4me2, AR and FoxA1 within the same CTCF block combinatorially regulate a subset of distally located androgen-responsive genes involved in prostate carcinogenesis

    BOG: R-package for Bacterium and virus analysis of Orthologous Groups

    Get PDF
    BOG (Bacterium and virus analysis of Orthologous Groups) is a package for identifying groups of differentially regulated genes in the light of gene functions for various virus and bacteria genomes. It is designed to identify Clusters of Orthologous Groups (COGs) that are enriched among genes that have gone through significant changes under different conditions. This would contribute to the detection of pathogens, an important scientific research area of relevance in uncovering bioterrorism, among others. Particular statistical analyses include hypergeometric, Mann–Whitney rank sum, and gene set enrichment. Results from the analyses are organized and presented in tabular and graphical forms for ease of understanding and dissemination of results. BOG is implemented as an R-package, which is available from CRAN or can be downloaded from http://www.stat.osu.edu/~statgen/SOFTWARE/BOG/

    Therapeutic modulation of the CD47-SIRPα axis in the pediatric tumor microenvironment: working up an appetite

    No full text
    Evasion of immune surveillance is one of the hallmarks of cancer. Although the adaptive immune system has been targeted via checkpoint inhibition, many patients do not sustain durable remissions due to the heterogeneity of the tumor microenvironment, so additional strategies are needed. The innate immune system has its own set of checkpoints, and tumors have co-opted this system by expressing surface receptors that inhibit phagocytosis. One of these receptors, CD47, also known as the “don’t eat me” signal, has been found to be overexpressed by most cancer histologies and has been successfully targeted by antibodies blocking the receptor or its ligand, signal regulatory protein α (SIRPα). By enabling phagocytosis via antigen-presenting cells, interruption of CD47-SIRPα binding leads to earlier downstream activation of the adaptive immune system. Recent and ongoing clinical trials are demonstrating the safety and efficacy of CD47 blockade in combination with monoclonal antibodies, chemotherapy, or checkpoint inhibitors for adult cancer histologies. The aim of this review is to highlight the current literature and research on CD47, provide an impetus for investigation of its blockade in pediatric cancer histologies, and provide a rationale for new combination therapies in these patients

    Identification of two types of GGAA-microsatellites and their roles in EWS/FLI binding and gene regulation in Ewing sarcoma

    No full text
    <div><p>Ewing sarcoma is a bone malignancy of children and young adults, frequently harboring the EWS/FLI chromosomal translocation. The resulting fusion protein is an aberrant transcription factor that uses highly repetitive GGAA-containing elements (microsatellites) to activate and repress thousands of target genes mediating oncogenesis. However, the mechanisms of EWS/FLI interaction with microsatellites and regulation of target gene expression is not clearly understood. Here, we profile genome-wide protein binding and gene expression. Using a combination of unbiased genome-wide computational and experimental analysis, we define GGAA-microsatellites in a Ewing sarcoma context. We identify two distinct classes of GGAA-microsatellites and demonstrate that EWS/FLI responsiveness is dependent on microsatellite length. At close range “promoter-like” microsatellites, EWS/FLI binding and subsequent target gene activation is highly dependent on number of GGAA-motifs. “Enhancer-like” microsatellites demonstrate length-dependent EWS/FLI binding, but minimal correlation for activated and none for repressed targets. Our data suggest EWS/FLI binds to “promoter-like” and “enhancer-like” microsatellites to mediate activation and repression of target genes through different regulatory mechanisms. Such characterization contributes valuable insight to EWS/FLI transcription factor biology and clarifies the role of GGAA-microsatellites on a global genomic scale. This may provide unique perspective on the role of non-coding DNA in cancer susceptibility and therapeutic development.</p></div

    Characteristics of EWS/FLI-bound microsatellites.

    No full text
    <p>(<b>A</b>) Permutation test shows that the number of EWS/FLI binding sites that overlap with repeat regions (<i>n</i> = 8,256) with minimum of 3 consecutive motifs is significantly higher than random chance (<i>p</i> < 0.001). Red line denotes the significance limit (α = 0.05). Gray bars represent the number of overlaps in the random regions with EWS/FLI binding sites in 1,000 permutations. The black line represents the mean of overlaps in random regions (EV<sub>perm</sub>) and the green bar is the actual number of overlaps observed in repeat regions (Obs). (<b>B</b>) Boxplot of EWS/FLI fold-enrichment (relative to genomic background) and number of consecutive motifs in EWS/FLI-bound microsatellites showing statistically significant increasing trend (<i>p</i> < 2.2 × 10<sup>−16</sup>). The blue line is the estimated LOESS regression line of the mean with the estimated 95% confidence bands (shaded region). (<b>C</b>) Boxplot of EWS/FLI fold-enrichment and total number of motifs in EWS/FLI-bound microsatellites showing a positive correlation (<i>p</i> = 1.9 × 10<sup>−10</sup>) and a non-linear trend (<i>p</i> < 0.05). The blue line is the estimated LOESS regression line of the mean with the estimated 95% confidence bands (shaded region). (<b>D</b>) Boxplot of EWS/FLI fold-enrichment and Density showing statistically significant positive correlation (<i>p</i> < 2.2 × 10<sup>−16</sup>). The blue line is the estimated LOESS regression line of the mean with the estimated 95% confidence bands (shaded region).</p

    Schema and characteristics of repeat regions across genome.

    No full text
    <p>(<b>A</b>) Schema of repeat regions. Regions with only one type of motif are called pure repeat region while those with both GGAA and TTCC are called mixed repeat regions. Each repeat region (purple box) is separated by at least 20-bp consecutive non-motifs. (<b>B</b>) Histogram of maximum number of consecutive motifs. (<b>C</b>) Histogram of total number of motifs. (<b>D</b>) Histogram of motif density of repeat regions. . Bin width is 5%. (<b>E</b>) Histogram of length of repeat regions. Each bin is 100bp width (e.g., first bin is 0-100bp length). Bins with zero repeat regions are not shown. (<b>F</b>) The characteristics of repeat regions for pure and mixed repeat regions across the genome. Red line indicates the mean for each characteristic.</p

    Correlation between EWS/FLI-bound microsatellites, GGAA-motif and gene expression.

    No full text
    <p>(<b>A</b>) Scatter plot of expression of activated genes and EWS/FLI fold-enrichment at promoter-like microsatellites showing a positive correlation (<i>r</i> = 0.46, <i>p</i> = 3.35 × 10<sup>−7</sup>). (<b>B</b>) Boxplot of EWS/FLI fold-enrichment and number of consecutive motifs of EWS/FLI-bound at promoter-like microsatellites for activated genes showing a non-linear trend. Blue line is the estimated LOESS regression line of the mean with the estimated 95% confidence interval (shaded region). Overall, there is statistically significant positive correlation (<i>r</i> = 0.43, <i>p</i> = 1.5 × 10<sup>−6</sup>). (<b>C</b>) Boxplot of EWS/FLI-activated gene expression and number of consecutive motifs at promoter-like EWS/FLI-bound microsatellites for gene activation showing a non-linear trend as seen in EWS/FLI binding intensities and a statistically significant positive correlation (<i>r</i> = 0.23, <i>p</i> = 0.01). The blue line is the estimated LOESS regression line of the mean with the estimated 95% confidence bands (shaded region). (<b>D</b>) Scatter plot of expression of activated genes and EWS/FLI fold-enrichment at enhancer-like microsatellites showing a positive correlation (<i>r</i> = 0.15, <i>p</i> = 3.5 × 10<sup>−4</sup>). (<b>E</b>) Boxplot of EWS/FLI fold-enrichment and number of consecutive motifs at EWS/FLI-bound enhancer-like microsatellites showing a positive correlation (<i>r</i> = 0.53, <i>p</i> = 2.2 × 10<sup>−16</sup>). Blue line is the estimated LOESS regression line of the mean and the standard error of the prediction shown as shaded region. (<b>F</b>) Boxplot of EWS/FLI fold-enrichment and number of consecutive motifs at EWS/FLI-bound enhancer-like microsatellites associated with gene repression showing positive correlation (<i>r</i> = 0.40, <i>p</i> < 2.2 × 10<sup>−16</sup>). The blue line is the estimated LOESS regression line of the mean with the estimated 95% confidence bands (shaded region).</p
    corecore