18 research outputs found
Additional file 1 of Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans
Additional file 1: Figure S1 Filtering of ChIP-seq samples. A: Schematic overview of ChIP-seq sample filtering. B: Violin plot showing the AUROC of the prediction of the top 10% PWM-supported k-mers based on the MOCCS2score. The red violin plot represents all CTCF ChIP-seq samples, the green plot represents soft-filtered CTCF ChIP-seq samples, and the blue plot represents hard-filtered CTCF ChIP-seq samples. High-quality ChIP-seq samples with high AUROC scores were retained after hard filtering. C: Distribution of each quality control metric of ChIP-seq sample filtering for samples that passed the hard filter (pink) and others (blue). D: Bar plots display the number of ChIP-seq samples that passed through the soft and hard filters. Bars are colored according to cell type classes or TFs. Figure S2 Simulation of significant k-mer detection. A: The procedure for generating simulated datasets. Simulated data generated by embedding a âtrue significant k-merâ within random sequences was applied to MOCCS2 and the q-values of the MOCCS2score were calculated for each k-mer. B: Parameters for each simulation condition from #1 to #5. α is the percentage of input sequences containing embedded âtrue significant k-mersâ , N is the number of peaks in a ChIP-seq sample, and Ï is the standard deviation of the embedded âtrue significant k-mersâ from the center of the peak. C: Simulation results for significant k-mer detection. The sensitivity, specificity, and FDR for detecting âtrue significant k-mersâ are shown for different parameter settings. Figure S3 Number of peaks and significant k-mers in MOCCS profiles. A: Number of peaks in MOCCS profiles. The x-axis represents the log-transformed number of peaks with a base of 10 and the y-axis represents the number of ChIP-seq samples. B: Relationship between the number of peaks and significant k-mers in MOCCS profiles (left, q < 0.05; right, q < 0.01). Figure S4 Similarities in MOCCS profiles and peak locations for sample pairs of same or different TFs. A: Comparison of k-sim Jaccard, Pearson and peak overlap indices (a-c: groups of the same cell types). B: Two-dimensional density plot of k-sim Jaccard or Pearson with the peak overlap index (a-c: groups of the same cell types). C: Correlation coefficient of k-sim Jaccard or Pearson with the peak overlap index in each group. The y-axis indicates Spearmanâ s correlation coefficient. Red and blue indicate k-sim Pearson and Jaccard values, respectively (a-c: groups of the same cell types) Figure S5 Similarities in MOCCS profiles and peak locations for sample pairs of same/different cell types. A: Comparison of the k-sim Jaccard, Pearson, and peak overlap indices (a, d, and e: groups of the same TFs). B: Two-dimensional density plot of k-sim Jaccard or Pearson with the peak overlap index (a, d, and e: groups of the same TFs). C: Correlation coefficient of k-sim Jaccard or Pearson with the peak overlap index in each group. The y-axis indicates Spearmanâ s correlation coefficient. Red and blue indicate k-sim Pearson and Jaccard values, respectively (a, d, and e: groups of the same TFs). Figure S6 Heat maps of cell type-dependent TFs. The heat map color indicates the k-sim Jaccard value for the 33 cell type-dependent TFs. The color labels of the heat maps indicate the cell type classes. Cell type classes with only a single ChIP-seq sample were excluded from the visualization. Asterisks indicate the statistical significance of ChIP-seq samples with the same and different cell type classes (MannâWhitney U test, p < 0.05). Figure S7 Violin plots of all cell type-dependent TFs. The y-axis indicates the k-sim Jaccard value. The same and different groups were arranged along the x-axis. Asterisks indicate the statistical significance of ChIP-seq samples with the same and different cell type classes (MannâWhitney U test, p < 0.05). Figure S8 Simulation of differential k-mer detection. A: Simulated data processing. Simulated data with an embedded âtrue differential k-merâ and âtrue significant k-merâ was prepared by embedding a âtrueâ k-mer within α% of a randomly generated sample of 2W + 1 bp (W = 350) DNA sequences and applied to MOCCS2. âTrue significant k-mersâ were embedded following a normal distribution whose mean was W + 1 and whose standard deviation was Ï. âTrue differential k-mersâ were embedded in S1 (or S2), similar to âtrue significant k-mers,â and were embedded in S2 (or S1) following a uniform distribution whose mean was 1 and whose standard deviation was (2 Ă W + 1) â (k â 1). It should be noted that we set k as k=6. B: Parameters for each simulation condition from #1 to #5. L is the number of differential k-mers and m is the number of significant k-mers. Figure S9 ÎMOCCS2score profiles were consistent with the in vitro SNP-SELEX and PWM motif fold change. A: Spearmanâ s correlation coefficient between PBS (SNP-SELEX) and ÎMOCCS2score in each TF for the original and permuted data. Red points indicate the original Spearmanâ s correlation coefficient, and blue points indicate the permutated data. B: Difference in ÎMOCCS2score profile consistency among the positions of SNPs in k-mers. The kth SNP position indicates the kth allele on the left side of the k-mer. C: The ÎMOCCS2score is consistent with the PWM motif fold change. Figure S10 Number of peak-overlapping GWAS-SNPs with significant ÎMOCCS2scores. Number of peak-overlapping GWAS-SNPs in each ChIP-seq sample. Each bar represents a ChIP-seq sample, and the y-axis represents the number of peak-overlapping GWAS-SNPs. The red fraction represents the number of peak-overlapping GWAS-SNPs with significant ÎMOCCS2scores (q < 0.05), and the gray fraction represents the number of GWAS SNPs with non-significant ÎMOCCS2scores. Figure S11 Prediction of SNP-affected TFs and cell type classes using ÎMOCCS2score profiles. Top ChIP-seq samples with high ÎMOCCS2scores in each phenotype (IBD, inflammatory bowel disease; CD, Crohnâ s disease; MS, multiple sclerosis; SLE, systemic lupus erythematosus). The ÎMOCCS2score was calculated for each SNP and ChIP-seq sample. Bar graph colors represent TFs or cell type classes. Figure S12 Association between the allele frequency and ÎMOCCS2score. Association between the allele frequency and (A) the absolute values of the ÎMOCCS2score or (B) the ratio of SNPs with significant ÎMOCCS2scores in each phenotype (IBD, inflammatory bowel disease; CD, Crohnâ s disease; MS, multiple sclerosis; SLE, systemic lupus erythematosus). Figure S13 Accuracy of detecting canonical motifs using MOCCS2score for different k. AUROC for detecting canonical PWM motifs using the MOCCS2score in the difference of value k. The x-axis represents the ratio of PWM-supported k-mers in all k-mers and the y-axis represents the AUROC. The colors of the violin plots represent the different k values
The nonlinear ARX model of the IEGs.
<p>(A) The simulation result of the nonlinear ARX model (solid lines) together with the experimental results in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0057037#pone-0057037-g001" target="_blank">Figure 1B</a> (dots). The colour codes are the same as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0057037#pone-0057037-g001" target="_blank">Figure 1B</a>. The experimental data in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0057037#pone-0057037-g001" target="_blank">Figure 1B</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0057037#pone.0057037.s002" target="_blank">Figure S2</a> were used for parameter estimation of the nonlinear ARX model. (B) The identified systems by the nonlinear ARX model. The upstream dependency (selected inputs), Hill functions, and frequency response curve of the nonlinear ARX model were shown. The selected inputs, pERK (solid line), pCREB (dotted line), pJNK (dashed line), and c-FOS (dashed and dotted line) were numbered.</p
System identification by the nonlinear ARX model.
<p>(A) The modeling scheme of the nonlinear ARX model. Upstream dependency was determined by lag order number, <i>m</i>. For example, if <i>m</i>â=â0, upstream signal is not transmitted downstream, otherwise signal is transmitted downstream. The signals of the selected upstream molecules were transformed successively by Hill function and linear ARX model, that characterise a system with switch-like (solid line) or graded (dotted line) dose response, and with temporal filters such as a low-pass filter (dotted line) and that with an inverse notch (solid line), respectively (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0057037#s2" target="_blank">Materials and methods</a>). (B) Temporal signal transformation in the nonlinear ARX model. For example, signal transformation in the nonlinear ARX model of c-FOS was shown. pERK and pCREB were selected upstream molecules, but pp38 and pJNK were not (<i>m</i>â=â0). The signals of pERK and pCREB were transformed by the Hill equations. Then, the transformed signals by the Hill equations were temporally transformed by the linear ARX model. The sum of the transformed signals by the linear ARX model was c-FOS, the final output of the nonlinear ARX model of c-FOS.</p
The selected inputs and parameters of the Hill function and frequency response curves of the nonlinear ARX model.
<p>The selected inputs and parameters of the Hill function and frequency response curves of the nonlinear ARX model.</p
The selective expression of EGR1 in response to pulsatile ERK phosphorylation.
<p>(A) The step (5 ng/ml, red), pulse (5 ng/ml, 6 min, blue), and pulsatile NGF stimulation (0.5 ng/ml, 6 min with 12-min intervals for four times, green) were given as indicated by bars (top), and pERK, pCREB, EGR1, and c-FOS were measured in experiments (dots). Using the experimental data of pERK and pCREB as the selected inputs, the outputs (c-FOS and EGR1) were simulated by the nonlinear ARX model (solid lines). (B) Interval dependency of EGR1 and c-FOS expression. The pulsatile NGF stimulation (0.5 ng/ml, 15-min duration for each pulse) with the indicated intervals were given, and pERK, EGR1, and c-FOS expression were measured in experiments. The area under the curve (AUC) (0â480 min) of EGR1 and c-FOS are shown in bars. The intervals are indicated by the colour codes. Bars represent means ±S.D.(nâ=â4). Note that 15-min duration of pulses was used in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0057037#pone-0057037-g004" target="_blank">Figure 4B</a> because of the technical limitation of probe numbers of the automated liquid-handling robots, and pulsatile stimulation with 6-min pulse duration and 12-min intervals were available at most four times (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0057037#pone-0057037-g004" target="_blank">Figure 4A</a>).</p
Temporal Decoding of MAP Kinase and CREB Phosphorylation by Selective Immediate Early Gene Expression
<div><p>A wide range of growth factors encode information into specific temporal patterns of MAP kinase (MAPK) and CREB phosphorylation, which are further decoded by expression of immediate early gene products (IEGs) to exert biological functions. However, the IEG decoding system remain unknown. We built a data-driven based on time courses of MAPK and CREB phosphorylation and IEG expression in response to various growth factors to identify how signal is processed. We found that IEG expression uses common decoding systems regardless of growth factors and expression of each IEG differs in upstream dependency, switch-like response, and linear temporal filters. Pulsatile ERK phosphorylation was selectively decoded by expression of EGR1 rather than c-FOS. Conjunctive NGF and PACAP stimulation was selectively decoded by synergistic JUNB expression through switch-like response to c-FOS. Thus, specific temporal patterns and combinations of MAPKs and CREB phosphorylation can be decoded by selective IEG expression via distinct temporal filters and switch-like responses. The data-driven modeling is versatile for analysis of signal processing and does not require detailed prior knowledge of pathways.</p> </div
Conjunctive stimulation of NGF and PACAP induced synergistic JUNB expression through switch-like response to c-FOS.
<p>The step stimulation of NGF alone (5 ng/ml, red), PACAP alone (100 nM, blue), and both NGF and PACAP (violet) were given, and pERK, pCREB, c-FOS, JUNB, and FOSB were measured in experiments (dots). The simulation results of the nonlinear ARX model are shown (solid lines). Black dots indicate the sum of the IEG in response to NGF alone and to PACAP alone, and arrows indicate the difference from the sum.</p
System identification reveals temporal decoding systems of MAP kinase and CREB phosphorylation by selective IEG expression.
<p>We made a system identification of temporal decoding of MAP kinase and CREB phosphorylation by selective immediate early genes expression such as c-FOS, EGR1, c-JUN and JUNB using time series data and the nonlinear ARX model. We found that the expression of IEGs has a distinct upstream dependency, and there are distinct switch-like responses and temporal filters for decoding upstream signals. For example, pulsatile ERK phosphorylation was decoded by selective expression of EGR1 rather than c-FOS, and conjunctive NGF and PACAP stimulation was decoded by synergistic JUNB expression through a switch-like response to c-FOS.</p
The comparison of NARX and ODE modeling frameworks.
<p>The comparison of NARX and ODE modeling frameworks.</p
Prediction and validation of the identified system by pharmacological perturbation.
<p>(<b>A</b>) The predictive simulation result and experimental result by PACAP stimulation in the presence (black) or absence (blue) of trametinib. Lines, simulation; dots, experimental and recovered data. Experimental and recovered data of pERK and pCREB, and the simulated data of c-Jun, c-Fos, Egr1, FosB, and JunB are given as <i>Inputs</i>, and simulation was performed using the NARX model in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005913#pcbi.1005913.g005" target="_blank">Fig 5</a> (see âSimulation of the integrated NARX modelâ section in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005913#sec008" target="_blank">Materials and methods</a>). In the experiment, PC12 cells were treated in the absence (blue dots) or in the presence (black dots) of trametinib (10 ÎŒM) added at 30 min before stimulation with PACAP (100 nM). Note that the PACAP stimulation data are used, as in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005913#pcbi.1005913.g004" target="_blank">Fig 4</a>. (<b>B</b>) Simulation using experimental and recovered data as <i>Inputs</i>. For each set of the <i>Inputs</i> (left panel for each) and <i>Outputs</i> (right panel for each), the unequally spaced time series data were recovered (pluses) (right panel for each), and the responses of <i>Outputs</i> were simulated by the NARX model identified in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005913#pcbi.1005913.g005" target="_blank">Fig 5Aâ5C</a> (solid lines) (right panel for each).</p