36 research outputs found
Surrogate regret bounds for generalized classification performance metrics
We consider optimization of generalized performance metrics for binary
classification by means of surrogate losses. We focus on a class of metrics,
which are linear-fractional functions of the false positive and false negative
rates (examples of which include -measure, Jaccard similarity
coefficient, AM measure, and many others). Our analysis concerns the following
two-step procedure. First, a real-valued function is learned by minimizing
a surrogate loss for binary classification on the training sample. It is
assumed that the surrogate loss is a strongly proper composite loss function
(examples of which include logistic loss, squared-error loss, exponential loss,
etc.). Then, given , a threshold is tuned on a separate
validation sample, by direct optimization of the target performance metric. We
show that the regret of the resulting classifier (obtained from thresholding
on ) measured with respect to the target metric is
upperbounded by the regret of measured with respect to the surrogate loss.
We also extend our results to cover multilabel classification and provide
regret bounds for micro- and macro-averaging measures. Our findings are further
analyzed in a computational study on both synthetic and real data sets.Comment: 22 page
An analysis of chaining in multi-label classification
The idea of classifier chains has recently been introduced as a promising technique for multi-label classification. However, despite being intuitively appealing and showing strong performance in empirical studies, still very little is known about the main principles underlying this type of method. In this paper, we provide a detailed probabilistic analysis of classifier chains from a risk minimization perspective, thereby helping to gain a better understanding of this approach. As a main result, we clarify that the original chaining method seeks to approximate the joint mode of the conditional distribution of label vectors in a greedy manner. As a result of a theoretical regret analysis, we conclude that this approach can perform quite poorly in terms of subset 0/1 loss. Therefore, we present an enhanced inference procedure for which the worst-case regret can be upper-bounded far more tightly. In addition, we show that a probabilistic variant of chaining, which can be utilized for any loss function, becomes tractable by using Monte Carlo sampling. Finally, we present experimental results confirming the validity of our theoretical findings
Generation of Exhaustive Set of Rules within Dominance-based Rough Set Approach
AbstractThe rough sets theory has proved to be a useful mathematical tool for the analysis of a vague description of objects. One of extensions of the classic theory is the Dominance-based Set Approach (DRSA) that allows analysing preference-ordered data. The analysis ends with a set of decision rules induced from rough approximations of decision classes. The role of the decision rules is to explain the analysed phenomena, but they may also be applied in classifying new, unseen objects. There are several strategies of decision rule induction. One of them consists in generating the exhaustive set of minimal rules. In this paper we present an algorithm based on Boolean reasoning techniques that follows this strategy with in DRSA
On the Bayes-optimality of F-measure maximizers
The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems
Set-Valued Prediction in Multi-Class Classification
In cases of uncertainty, a multi-class classifier preferably returns a set of
candidate classes instead of predicting a single class label with little
guarantee. More precisely, the classifier should strive for an optimal balance
between the correctness (the true class is among the candidates) and the
precision (the candidates are not too many) of its prediction. We formalize
this problem within a general decision-theoretic framework that unifies most of
the existing work in this area. In this framework, uncertainty is quantified in
terms of conditional class probabilities, and the quality of a predicted set is
measured in terms of a utility function. We then address the problem of finding
the Bayes-optimal prediction, i.e., the subset of class labels with highest
expected utility. For this problem, which is computationally challenging as
there are exponentially (in the number of classes) many predictions to choose
from, we propose efficient algorithms that can be applied to a broad family of
utility functions. Our theoretical results are complemented by experimental
studies, in which we analyze the proposed algorithms in terms of predictive
accuracy and runtime efficiency