12 research outputs found

    msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

    Get PDF
    <div><p>Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at <a href="http://rajanil.github.io/msCentipede" target="_blank">http://rajanil.github.io/msCentipede</a>.</p></div

    Modeling factor-specific DNase I cleavage profile and sequence bias in DNase cleavage increases prediction accuracy.

    No full text
    <p>A: Modeling the DNase I cleavage profile at bound sites increases the prediction accuracy of msCentipede across a broad range of transcription factors. Each point on the plot corresponds to a different transcription factor. B: We show the ROC curves for transcription factor EBF1 for three different models of increasing complexity. We observe a substantial increase in accuracy when incorporating a multi-scale model for the factor-specific cleavage profile; however, the increase in accuracy when modeling the background cleavage rate using naked DNA data is rather modest. This holds true for a broad range of factors as shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0138030#pone.0138030.s004" target="_blank">S4 Fig</a>.</p

    Illustration that DNase I cleavage profiles exhibit excess variation compared with a multinomial model.

    No full text
    <p>For a set of 1000 SP1 motif instances with high ChIP-seq signal, we computed, for a 100bp window around each motif instance, the ratio of number of DNase I cuts mapped to the left half of the window to the number of DNase I cuts mapped to the entire window. The histogram of these ‘observed ratios’ is shown in orange. Under a multinomial model the number of reads mapping to each half of the window should have a binomial distribution, and we used this fact to simulate ‘expected ratios’ (gray line); see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0138030#pone.0138030.s009" target="_blank">S1 Methods</a> for more details. The observed ratios are clearly overdispersed compared with the expectation under a multinomial model.</p

    Accuracy of msCentipede, CENTIPEDE and PIQ across a range of transcription factors.

    No full text
    <p>Each point corresponds to a different factor and accuracy is measured by area under the ROC curve. Blue points correpond to factors where msCentipede achieves higher accuracy than CENTIPEDE (top panels) or PIQ (bottom panels), and orange points correspond to a worse performance by msCentipede. A: The algorithms are compared using data from a single replicate. B: The algorithms are compared using data from multiple library replicates.</p

    A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians

    Get PDF
    <div><p>We conducted a genome-wide association analysis of 7 subfractions of low density lipoproteins (LDLs) and 3 subfractions of intermediate density lipoproteins (IDLs) measured by gradient gel electrophoresis, and their response to statin treatment, in 1868 individuals of European ancestry from the Pharmacogenomics and Risk of Cardiovascular Disease study. Our analyses identified four previously-implicated loci (SORT1, APOE, LPA, and CETP) as containing variants that are very strongly associated with lipoprotein subfractions (log<sub>10</sub>Bayes Factor > 15). Subsequent conditional analyses suggest that three of these (APOE, LPA and CETP) likely harbor multiple independently associated SNPs. Further, while different variants typically showed different characteristic patterns of association with combinations of subfractions, the two SNPs in CETP show strikingly similar patterns - both in our original data and in a replication cohort - consistent with a common underlying molecular mechanism. Notably, the CETP variants are very strongly associated with LDL subfractions, despite showing no association with total LDLs in our study, illustrating the potential value of the more detailed phenotypic measurements. In contrast with these strong subfraction associations, genetic association analysis of subfraction response to statins showed much weaker signals (none exceeding log<sub>10</sub>Bayes Factor of 6). However, two SNPs (in APOE and LPA) previously-reported to be associated with LDL statin response do show some modest evidence for association in our data, and the subfraction response proles at the LPA SNP are consistent with the LPA association, with response likely being due primarily to resistance of Lp(a) particles to statin therapy. An additional important feature of our analysis is that, unlike most previous analyses of multiple related phenotypes, we analyzed the subfractions jointly, rather than one at a time. Comparisons of our multivariate analyses with standard univariate analyses demonstrate that multivariate analyses can substantially increase power to detect associations. Software implementing our multivariate analysis methods is available at <a href="http://stephenslab.uchicago.edu/software.html" target="_blank">http://stephenslab.uchicago.edu/software.html</a>.</p></div

    Decomposition of the gain (or loss) from a multivariate analysis of 12 phenotypes vs a univariate analysis of LDL-C into two components: one from using more detailed measurements, and one from using a multivariate analysis.

    No full text
    <p>Plotted are (a) log<sub>10</sub> BF<sub>av</sub> (the BF based on multivariate analysis of all 12 phenotypes) vs log<sub>10</sub> BF<sub>ldl</sub> (the BF based on univariate analysis of LDL-C), (b) log<sub>10</sub> BF<sub>uni</sub> (the BF based on univariate analysis of all 12 phenotypes) vs log<sub>10</sub> BF<sub>ldl</sub>, and (c) log<sub>10</sub> BF<sub>av</sub> vs log<sub>10</sub> BF<sub>uni</sub>. SNPs are colored according to the nearest gene.</p

    Effects of associated SNPs on LDL subfractions.

    No full text
    <p>Solid lines show mean (normalized) phenotype for each subfraction (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0120758#pone.0120758.t001" target="_blank">Table 1</a> for abbrevations) by genotype class (reference homozygotes: red, heterozygotes: purple, non-reference homozygotes: sky blue; the proportion of individuals is shown next to the genotype; sky blue lines are omitted if the proportion is ≤ 0.01); dotted lines show ±2 standard errors. Results for secondary SNPs are based on residuals from regressing out top SNPs. Grey shading indicates posterior probability of association (either directly or indirectly) for each phenotype (> 0.9: dark grey, > 0.75: light grey, < 0.75: white; raw numbers given at top of figure). Note that, because the minor alleles have opposite effects on total HDL-C at the two SNPs in CETP, the <i>y</i>-axis is reversed for the secondary SNP to emphasize the similar shapes of the curves.</p

    Replication of two independent associations in CETP (top SNP rs247616 and secondary SNP rs11076175).

    No full text
    <p>Dashed lines show estimated effect sizes on (normalized) phenotype for each subfraction in our study (red) and JUPITER (blue); dotted lines show ±2 standard errors. Note that, the y-axis is reversed for the secondary SNP as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0120758#pone.0120758.g002" target="_blank">Fig 2</a>.</p
    corecore