21 research outputs found
Top ten mutations for each RC dataset according to weights derived from the initial linear SVR.
<p>Along with a mutation, its influence on RC compared to the wild-type is listed â âdec.â for âdecreasingâ, âinc.â for âincreasingâ â as well as its position in the feature ranking of the other dataset. With the exception of RT A158S, PR I64L, PR P39S, RT Q207E, RT E122K, RT S162C, and RT T39E, all of theses mutations are known to be associated with HIV drug resistance and/or fitness <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0009044#pone.0009044-Dykes1" target="_blank">[20]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0009044#pone.0009044-Shafer1" target="_blank">[31]</a>. In total, the two feature rankings consist of 878 mutations from the Monogram dataset and 1018 mutations from the Erlangen dataset; the difference is mainly due to the fact that fewer sequence positions are included in the Monogram genotypes. Note that the mutation RT E122K does not occur in the Monogram ranking. In the Monogram dataset, lysine (K) â not the wild-type glutamic acid (E) â is the consensus amino acid at position 122 of the RT sequence, so that E122K was removed from the training dataset in the input coding phase. The clear dominance of RC-decreasing mutations in the Monogram dataset may be partly due to the stronger bias towards low-RC samples in this dataset (median measured RC of 38.45%, compared to 46.47% in the Erlangen dataset; see also <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0009044#pone-0009044-g001" target="_blank">Figure 1</a>).</p
Leave-one-out cross-validation results for the Monogram data (left) and Erlangen data (right).
<p>Spearman correlations of true and predicted RC values are Ïâ=â0.546 (Monogram) and Ïâ=â0.542 (Erlangen). For the Erlangen data, seven outliers with very high measured RC would appear further to the right side of the plot, but are not shown.</p