40 research outputs found
Algorithm 1: Pseudo code for computing the AUC-PR based on the continuous interpolation.
<p>Initially, we choose the classification threshold such that the number of true positives is equal to the total number of positives. Then we iterate as long as the number of true positives β and, hence, recall β is greater than . We determine the new point by choosing the next existing score as classification threshold. Unless this threshold leads to an identical number of true positives, we compute the values of , , and as defined by <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0092209#pone.0092209.e110" target="_blank">equation (6</a>), and set the borders of the integration. We use these values to compute the AUC between the current points and , and proceed with the while-loop. After termination of the loop, holds the AUC-PR.</p
PR and ROC curves and respective AUC values for weighted and unweighted data.
<p>Panel (a) show a histogram of foreground weights () for all data points. The dashed line indicates the threshold used to separate foreground and background data points in the unweighted case. Panel (b) presents a histogram of classification scores. Within the bars of the histogram, we visualize the number of data points from the foreground (green) and background (red) class according to the unweighted case. Panel (c) presents classification performance using unweighted data computed from the classification scores presented in panel (b). Panel (d) visualizes the relationship between classification scores and weights for the hypothetical good, permuted, and bad classifiers. All three orderings of classification scores share the same underlying distribution as shown in panel (b). Panel (e) show the clearly distinguishable classification performance of the three classifiers as measured by ROC and PR curves using weighted data. The corresponding AUC values are listed in panel (f).</p
Precision recall curves for data set with 100 data points and class ratio 1 to 4.
<p>The blue and the red curve indicate estimators of the best and the worst curve, respectively. The gray curves represent 1,000 PR curves based on a random scored-based classifications, which are also summarized by the green boxplots. The pink dashed line indicates the level of the class ratio .</p
Mean results for AUC-ROC and AUC-PR on PBM data sets using unweighted or weighted test data.
<p>The team name and the ranking is depicted on the abscissa, while the mean result for AUC-ROC and AUC-PR is depicted on the ordinate. Teams are displayed in the order of the original ranking of Weirauch <i>et al. </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0092209#pone.0092209-Weirauch1" target="_blank">[27]</a>.</p
Comparison of PR curves using unweighted and weighted test data for one exemplary data set (11) of [27].
<p>In panel (a), we plot the predicted log-intensity values of classifiers A, D, and E against the measured log-intensity values. Panel (b) visualizes the class border in the unweighted case (red line) and the weights of the foreground class () in the weighted case. In panel (c), we show the PR curves of the three classifiers using unweighted (left) and weighted (right) test data.</p
Differences of AUC-PR between the interpolations for varying size of the foreground data set.
<p>Panel (a) depicts the results for 10 bins equivalent to at most 10 different classification scores, whereas panel (b) depicts the results for 1,000 bins.</p
Comparison of ranking classifiers by AUC-PR using unweighted and weighted test data for query 29 from [18].
<p>The AUC-PR for unweighted test data is depicted in black, whereas the AUC-PR for weighted test data is depicted in red.</p
Comparison of AUC-PR values for different classification thresholds.
<p>In panel (a), we consider unweighted test data and plot the AUC-PR values for a threshold of mean intensity plus four times standard deviation (ordinate) against the AUC-PR values for a threshold of mean intensity plus four times standard deviation (abscissa). In panel (b), we consider weighted test data and plot the AUC-PR values in analogy to panel (a). We find a substantially greater Pearson correlation between the AUC-PR values for the different thresholds for weighted data compared to unweighted data.</p
Binary confusion matrix.
<p>The confusion matrix can be computed for weighted and unweighted data. For unweighted data each data point contributes with a weight of one, whereas for weighted data each data point contributes with its specific weight for the given class.</p
Classification for unweighted and weighted data.
<p>The entries of a confusion matrix have been calculated for a classification threshold of 1.5. In case of unweighted data, the class label is if and otherwise .</p