4 research outputs found
CPI-EM outperforms STAP and the peak distance detector in detecting cooperatively bound TF pairs across different datasets, even though STAP can better detect cooperatively bound target TF peaks that have high intensities.
<p><b>(A)</b> The auROCs of CPI-EM and STAP are shown in orange and sky blue, respectively. The auROC of the chance detector, which is always 0.5 is shown by a dashed line. The datasets marked with an asterisk (*) are those where STAP was numerically unstable (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0199771#sec002" target="_blank">Materials and methods</a>). The complete ROC curves for STAP and CPI-EM are shown in Fig M in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0199771#pone.0199771.s001" target="_blank">S1 Appendix</a> and that of the peak distance detector in Fig P in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0199771#pone.0199771.s001" target="_blank">S1 Appendix</a>. <b>(B)</b> CPI-EM detects more cooperative interactions amongst low intensity target TF peaks but STAP detects more such interactions amongst higher intensity target TF peaks. On the <i>x</i>âaxis, cooperatively bound FOXA1-HNF4A and RTG3-GCN4 peak pairs are divided into ten bins based on the intensity of the target TF, with the 10th bin having the highest intensity target TF peaks. The <i>y</i>âaxis represents the percentage of cooperative peak pairs actually detected by CPI-EM (orange) or STAP (sky blue) in each bin at a false positive rate of 40%.</p
A schematic of the use of the CPI-EM algorithm and ChIP-seq from knockout data to separately identify cooperative bound transcription factor pairs.
<p>ChIP-seq experiments carried out on two TFs, A and B, yield a list of locations that are bound by both TFs, along with peak intensities at each location. From this data, there are two ways in which we find genomic locations that are cooperatively bound by A and B. <b>(A)</b> A method for inferring these locations from a ChIP-seq of A carried out after B is genetically deleted. Locations where a peak of A either disappears altogether, or is reduced in intensity after knocking out B are labelled as cooperatively bound. In contrast, locations where a peak of A either remains unchanged or increases in intensity are labelled as non-cooperatively bound (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0199771#sec002" target="_blank">Materials and methods</a>). <b>(B)</b> Steps in predicting cooperatively bound locations are shown, where the numbers correspond to those in the section âThe ChIP-seq Peak IntensityâExpectation Maximisation (CPI-EM) algorithmâ in Materials and Methods. (1) The input to CPI-EM consists of a list of genomic locations where a peak of A overlaps a peak of B by at least a single base pair. Note that the ChIP-seq of A after B is knocked out is not an input to the algorithm. (2) Each of these overlapping intensity pairs is fit to a model that consists of a sum of two probability functions. These functions specify the probabilities of observing a particular peak intensity pair given that it comes from a cooperatively or non-cooperatively bound region. These probabilities are computed by fitting the model to the input data using the expectation-maximization algorithm (see Section H in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0199771#pone.0199771.s001" target="_blank">S1 Appendix</a>). (3) Bayesâ formula is applied to the probabilities computed in step (2) to find the probability of each peak intensity pair being cooperatively bound. (4) Each cooperative binding probability computed in step (3) that is greater than a threshold <i>α</i> is declared as cooperatively bound. We compare this list of predicted locations with the list of cooperatively bound locations inferred from knockout data in order to compute the number of correct and incorrect inferences made by CPI-EM.</p
Cooperatively bound target TFs are significantly more weakly bound than non-cooperatively bound target TFs.
<p><b>(A)</b> Box-plots of peak intensity distributions of cooperatively (orange) and non-cooperatively (gray) bound TF pairs, with target TFs on the left and partner TFs on the right. ****, *** and ** indicate p-values of <10<sup>â4</sup>, 10<sup>â3</sup> and 10<sup>â2</sup> from a Wilcoxon rank sum test. The whiskers of the box plot are the 5th and 95th percentiles of the distributions shown. <b>(B) ChIP-seq peak intensity distributions can be approximated by a Log-normal distribution.</b> Marginal peak intensity distributions of FOXA1 and HNF4A peaks (in filled black and orange circles), with fitted Log-normal distributions (solid black and orange lines). These, and similar distributions for the other TF pairs were better approximated by a Log-normal distribution, which was evident from the higher log-likelihood value associated with a Log-normal fit, compared to a Gaussian or Gamma distribution (Table H in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0199771#pone.0199771.s001" target="_blank">S1 Appendix</a>). Along side the marginal intensity distributions of FOXA1 and HNF4A is a scatter plot of (FOXA1,HNF4A) peak intensity pairs from cooperatively and non-cooperatively bound regions. The scatter points are colored according to the density of points in that region, with darker shades indicating a higher density. cooperative and non-cooperative FOXA1 and HNF4A peaks are shown. The density of points in the scatter were computed using the Gaussian kernel density estimation procedure in the Python Scipy library.</p
CPI-EM applied to ChIP-seq datasets from <i>M. musculus</i> (FOXA1-HNF4A), <i>S. cerevisiae</i> (RTG3-GCN4) and early-exponential phase cultures of <i>E. coli</i>(CRP-FIS).
<p>For each dataset, CPI-EM computes a list of cooperative binding probabilities at all the locations bound by the TF pair under consideration. <b>Top row: The fraction of cooperatively bound pairs, as determined from knockout data, that fall into each cooperative binding probability bin.</b> The bins are equally spaced with a width of 0.1 and the heights of the bars within each histogram add up to 1. <b>Bottom row:</b> <b>Receiver operating characteristic (ROC) curves that evaluate the performance of CPI-EM in detecting cooperatively bound pairs.</b> The curve is generated by calculating, for each value of <i>α</i> between 0 and 1, the true and false positive rate of the algorithm. The true positive rate (<i>TPR</i>(<i>α</i>)) is the ratio of the number of cooperatively bound regions detected (when <i>p</i><sub><i>coop</i></sub> is compared to a threshold of <i>α</i>) to the total number of regions that are found to be cooperatively bound from the knockout data. The false positive rate (<i>FPR</i>(<i>α</i>)) is the ratio of the number of non-cooperatively bound regions mistakenly detected as cooperatively bound (when <i>p</i><sub><i>coop</i></sub> is compared to a threshold of <i>α</i>), to the total number of regions that are found to be non-cooperatively bound from the knockout data. Small values of <i>α</i> give a higher TPR, but at the cost of a higher FPR. The area under the ROC (auROC) is a measure of detection performance, whose value cannot exceed 1, which corresponds to a perfect detector. Given the auROC of two different algorithms, the one with a higher auROC is better, on average, at detecting cooperative binding.</p