66 research outputs found
Comparison of different TTE predictors on the independent test set.
<p>(<b>A</b>) ROCs of five different methods. The values in the brackets are the average auROCs of each method. (<b>B</b>) Precision-recall curves of five different methods. Values in brackets are the average auPRCs of each method.</p
Classification performance of BEAN.
<p>(<b>A</b>) ROCs of different SVM kernel functions. (<b>B</b>) ROCs of different feature extraction methods. (<b>C</b>) ROCs of classification models using all 1600 features and the 100 top weighted features. The values in brackets are the auROCs of each model. All of above results are based on Wang’s data.</p
Sequence position distribution of <i>k</i>-spaced amino acid pairs.
<p>(<b>A</b>) Each point represents the overall frequency of the 50 most positively weighted amino acid pairs occurring at the N-terminal sequences from TTEs (red triangle) or non-TTEs (blue circle). Trend lines are drawn using <i>loess</i> smoothing for the points from TTEs (red) and non-TTEs (blue), respectively. (<b>B–D</b>) Position density distribution of pairs [SN], [T.V] and [VA] in TTEs (red solid line) and non-TTEs (blue dotted line). The horizontal axis in (<b>B–D</b>) is the same as in (<b>A</b>).</p
Gene differential expression distribution of prediction results.
<p>The vertical axis represents the fold changes of the gene expression level when <i>R. solanacearum</i> is cultured in tomato (planta) in comparison to the situation when <i>R. solanacearum</i> is cultured in rich medium (CPG). The number in the bracket is the gene number within this score interval. The statistically significant expression difference is observed between genes with SVM scores <0 and genes with SVM scores ≥1.0 (Mann-Whitney <i>U</i>-test, <i>p</i>-value <0.01).</p
Overview of the proposed TTE predictor BEAN.
<p>A full-length sequence is used to construct its profile (PSSM) via HHblits search. Only the first 2–51 residues of the N-terminal are used to compute the profile-based <i>k</i>-spaced amino acid pair composition. Then, the feature vectors with a dimensionality of 1600 are taken as input to train a linear SVM classification model. Through the parameter transformation of the established model, we obtained the weights of each <i>k</i>-spaced amino acid pair and analyzed the evolutionary conservation and sequence position distribution of each pair. We also used our BEAN to scan a pathogen genome and identify TTE candidates.</p
The ROC curves measuring the discriminative capability of the ubiquitination site indicators.
<p>The indicators include the sequence pattern, the structural propensities (local conformation, residue propensities in the microenvironment, accessibility and centrality) and their combination. For combination, individual indicators were combined by a weighted summing scheme (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0083167#pone.0083167.s009" target="_blank">Table S2</a> for the weights). The AUC values were calculated according to the structural propensities, the likelihood scores derived via five-fold cross-validation of the corresponding models or their combinations (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0083167#pone.0083167.s010" target="_blank">Text S1</a> for details). The larger the AUC value, the stronger the indicator.</p
The two-sample logo illustration of the context (sequence neighbors) of ubiquitination sites.
<p>(A) The positional residue pattern; (B) the secondary structure pattern and (C) the local conformation (structural alphabet) pattern where a seven-group color palette is used: helix (red), helix-like (orange), strand (blue), highly curved coil (yellow), moderately curved coil (violet) and flat coil (green). See also <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0083167#pone-0083167-t001" target="_blank">Tables 1</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0083167#pone-0083167-t002" target="_blank">2</a> for the description of the secondary structure type and structural alphabet state, respectively.</p
The residue usage in the proximal context and the microenvironments.
<p>(A-D) The radar diagrams which illustrate (A) the average residue frequencies in the proximal context (sequence neighbors within the ±6 residue range around the central lysine); (B) the average residue propensities in the first shell (C<sub>β</sub> distance, 0Å~7.5 Å proximal to the central lysine); (C) the average residue propensities in the second shell (C<sub>β</sub> distance, 7.5Å~11.5 Å); (D) the average residue propensities in the third shell (C<sub>β</sub> distance, 11.5Å~15.5 Å). </p
The accessibility and centrality of the ubiquitination sites.
<p>(A) Distribution of RSA for Ubsites, Non-Ubsites and Acetsites. The median values are indicated as vertical dashed lines. (B) Boxplot depicting the difference in the maximum protrusion index CX between Ubsites and Non-Ubsites. The range of whisker (dashed lines) is doubled to avoid displaying too many outliers. (C) Two-dimensional probability density plots illustrating the propensity for two network parameters of Ubsites (left) and Non-Ubsites (right). Note that the range and color schemes of these two plots have been unified in order to make a direct comparison.</p
Two case studies illustrating the prediction performance of different features at a false positive rate control of 3%.
<p>Panel A shows the predicted catalytic residues of TrpG (the small domain of anthranilate synthase; PDB entry: 1QDL), and panel B gives the predictions of diaminopimelate (DAP) epimerase (PDB entry: 1BWZ). Top parts: Protein structures are represented by cartoon ribbons and the corresponding catalytic residues are highlighted by ball-and-stick-models, as seen in the insets. Lower parts: The blue triangles represent the sequence positions of the catalytic residues. With respect to the prediction results of each feature, the sequence positions of the predicted catalytic residues are marked using colored bars, with a higher score corresponding to a more saturated color. The black bars denote catalytic residues which a corresponding feature failed to predict.</p
- …