51 research outputs found
F-measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets
We discuss a method to improve the exact F-measure maximization algorithm
called GFM, proposed in (Dembczynski et al. 2011) for multi-label
classification, assuming the label set can be can partitioned into
conditionally independent subsets given the input features. If the labels were
all independent, the estimation of only parameters ( denoting the number
of labels) would suffice to derive Bayes-optimal predictions in
operations. In the general case, parameters are required by GFM, to
solve the problem in operations. In this work, we show that the number
of parameters can be reduced further to , in the best case, assuming the
label set can be partitioned into conditionally independent subsets. As
this label partition needs to be estimated from the data beforehand, we use
first the procedure proposed in (Gasse et al. 2015) that finds such partition
and then infer the required parameters locally in each label subset. The latter
are aggregated and serve as input to GFM to form the Bayes-optimal prediction.
We show on a synthetic experiment that the reduction in the number of
parameters brings about significant benefits in terms of performance
Conformal Rule-Based Multi-label Classification
We advocate the use of conformal prediction (CP) to enhance rule-based
multi-label classification (MLC). In particular, we highlight the mutual
benefit of CP and rule learning: Rules have the ability to provide natural
(non-)conformity scores, which are required by CP, while CP suggests a way to
calibrate the assessment of candidate rules, thereby supporting better
predictions and more elaborate decision making. We illustrate the potential
usefulness of calibrated conformity scores in a case study on lazy multi-label
rule learning
ENDER: A Statistical Framework for Boosting Decision Rules
Induction of decision rules plays an important role in machine learning.
Themain advantage of decision rules is their simplicity and human-interpretable form.
Moreover, they are capable of modeling complex interactions between attributes. In
this paper, we thoroughly analyze a learning algorithm, called ENDER, which constructs
an ensemble of decision rules. This algorithm is tailored for regression and
binary classification problems. It uses the boosting approach for learning, which can
be treated as generalization of sequential covering. Each new rule is fitted by focusing
on examples which were the hardest to classify correctly by the rules already present
in the ensemble. We consider different loss functions and minimization techniques
often encountered in the boosting framework. The minimization techniques are used
to derive impurity measures which control construction of single decision rules. Properties
of four different impurity measures are analyzed with respect to the trade-off
between misclassification (discrimination) and coverage (completeness) of the rule.
Moreover, we consider regularization consisting of shrinking and sampling. Finally, we compare the ENDER algorithm with other well-known decision rule learners such
as SLIPPER, LRI and RuleFit
Bipartite Ranking through Minimization of Univariate Loss
Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is actually lost by minimizing a simple univariate loss function, as done by standard classification methods, as a surrogate. In this paper, we first note that minimization of 0/1 loss is not an option, as it may yield an arbitrarily high rank loss. We show, however, that better results can be achieved by means of a weighted (cost-sensitive) version of 0/1 loss. Yet, the real gain is obtained through margin-based loss functions, for which we are able to derive proper bounds, not only for rank risk but, more importantly, also for rank regret. The paper is completed with an experimental study in which we address specific questions raised by our theoretical analysis
F-Measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets
International audienceWe discuss a method to improve the exact F-measure max-imization algorithm called GFM, proposed in [2] for multi-label classification , assuming the label set can be partitioned into conditionally independent subsets given the input features. If the labels were all independent , the estimation of only m parameters (m denoting the number of labels) would suffice to derive Bayes-optimal predictions in O(m^2) operations [10]. In the general case, m^2 + 1 parameters are required by GFM, to solve the problem in O(m^3) operations. In this work, we show that the number of parameters can be reduced further to m^2 /n, in the best case, assuming the label set can be partitioned into n conditionally independent subsets. As this label partition needs to be estimated from the data beforehand, we use first the procedure proposed in [4] that finds such partition and then infer the required parameters locally in each label subset. The latter are aggregated and serve as input to GFM to form the Bayes-optimal prediction. We show on a synthetic experiment that the reduction in the number of parameters brings about significant benefits in terms of performance
- …