340 research outputs found
Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures?
This short note is a critical discussion of the quantification of aleatoric
and epistemic uncertainty in terms of conditional entropy and mutual
information, respectively, which has recently been proposed in machine learning
and has become quite common since then. More generally, we question the idea of
an additive decomposition of total uncertainty into its aleatoric and epistemic
constituents.Comment: 7 pages, 3 figure
Reliable Multi-label Classification: Prediction with Partial Abstention
In contrast to conventional (single-label) classification, the setting of
multilabel classification (MLC) allows an instance to belong to several classes
simultaneously. Thus, instead of selecting a single class label, predictions
take the form of a subset of all labels. In this paper, we study an extension
of the setting of MLC, in which the learner is allowed to partially abstain
from a prediction, that is, to deliver predictions on some but not necessarily
all class labels. We propose a formalization of MLC with abstention in terms of
a generalized loss minimization problem and present first results for the case
of the Hamming loss, rank loss, and F-measure, both theoretical and
experimental.Comment: 19 pages, 12 figure
An analysis of chaining in multi-label classification
The idea of classifier chains has recently been introduced as a promising technique for multi-label classification. However, despite being intuitively appealing and showing strong performance in empirical studies, still very little is known about the main principles underlying this type of method. In this paper, we provide a detailed probabilistic analysis of classifier chains from a risk minimization perspective, thereby helping to gain a better understanding of this approach. As a main result, we clarify that the original chaining method seeks to approximate the joint mode of the conditional distribution of label vectors in a greedy manner. As a result of a theoretical regret analysis, we conclude that this approach can perform quite poorly in terms of subset 0/1 loss. Therefore, we present an enhanced inference procedure for which the worst-case regret can be upper-bounded far more tightly. In addition, we show that a probabilistic variant of chaining, which can be utilized for any loss function, becomes tractable by using Monte Carlo sampling. Finally, we present experimental results confirming the validity of our theoretical findings
- …