13 research outputs found
Scaling success: Linking public breeding with private enterprise
<p>The known Downstream Promoter Element and Initiator site motifs are shown in boldface.</p
PeakRegressor Identifies Composite Sequence Motifs Responsible for STAT1 Binding Sites and Their Potential rSNPs
How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present “PeakRegressor,” a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency
Different regression methods and their correlation coefficients averaged on the test sets.
<p>Different regression methods and their correlation coefficients averaged on the test sets.</p
List of putative STAT1 binding motifs identified by linear least squares regression.
<p>The classical GAS motifs are shown in boldface.</p
List of putative RNA Polymerase II binding motifs identified by PeakRegressor.
<p>The known Downstream Promoter Element and Initiator site motifs are shown in boldface.</p
List of putative STAT1 binding motifs identified by partial least squares regression.
<p>The classical GAS motifs are shown in boldface.</p
Influence of the peak filtering methods on the correlation coefficients between peak values and their predicted values in the test dataset.
<p>The correlation coefficients are averaged in 30-fold cross-validation.</p
STAT1 regression results with two filtering methods: Q-value (right) and promoter proximity (left).
<p>The correlation coefficients on the test data between peak values and their predicted values are 0.65 and 0.41 for Q-value and promoter proximity filterings, respectively.</p
Schematic view of the workflow of PeakRegressor.
<p>PeakRegressor takes ChIP-Seq data as input and outputs a list of TFBM candidates and their weights that give the best regression accuracies.</p
List of putative STAT1 binding motifs identified by principal component regression.
<p>The classical GAS motifs are shown in boldface.</p