51 research outputs found

    F-measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets

    Full text link
    We discuss a method to improve the exact F-measure maximization algorithm called GFM, proposed in (Dembczynski et al. 2011) for multi-label classification, assuming the label set can be can partitioned into conditionally independent subsets given the input features. If the labels were all independent, the estimation of only mm parameters (mm denoting the number of labels) would suffice to derive Bayes-optimal predictions in O(m2)O(m^2) operations. In the general case, m2+1m^2+1 parameters are required by GFM, to solve the problem in O(m3)O(m^3) operations. In this work, we show that the number of parameters can be reduced further to m2/nm^2/n, in the best case, assuming the label set can be partitioned into nn conditionally independent subsets. As this label partition needs to be estimated from the data beforehand, we use first the procedure proposed in (Gasse et al. 2015) that finds such partition and then infer the required parameters locally in each label subset. The latter are aggregated and serve as input to GFM to form the Bayes-optimal prediction. We show on a synthetic experiment that the reduction in the number of parameters brings about significant benefits in terms of performance

    Conformal Rule-Based Multi-label Classification

    Full text link
    We advocate the use of conformal prediction (CP) to enhance rule-based multi-label classification (MLC). In particular, we highlight the mutual benefit of CP and rule learning: Rules have the ability to provide natural (non-)conformity scores, which are required by CP, while CP suggests a way to calibrate the assessment of candidate rules, thereby supporting better predictions and more elaborate decision making. We illustrate the potential usefulness of calibrated conformity scores in a case study on lazy multi-label rule learning

    ENDER: A Statistical Framework for Boosting Decision Rules

    No full text
    Induction of decision rules plays an important role in machine learning. Themain advantage of decision rules is their simplicity and human-interpretable form. Moreover, they are capable of modeling complex interactions between attributes. In this paper, we thoroughly analyze a learning algorithm, called ENDER, which constructs an ensemble of decision rules. This algorithm is tailored for regression and binary classification problems. It uses the boosting approach for learning, which can be treated as generalization of sequential covering. Each new rule is fitted by focusing on examples which were the hardest to classify correctly by the rules already present in the ensemble. We consider different loss functions and minimization techniques often encountered in the boosting framework. The minimization techniques are used to derive impurity measures which control construction of single decision rules. Properties of four different impurity measures are analyzed with respect to the trade-off between misclassification (discrimination) and coverage (completeness) of the rule. Moreover, we consider regularization consisting of shrinking and sampling. Finally, we compare the ENDER algorithm with other well-known decision rule learners such as SLIPPER, LRI and RuleFit

    Bipartite Ranking through Minimization of Univariate Loss

    No full text
    Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is actually lost by minimizing a simple univariate loss function, as done by standard classification methods, as a surrogate. In this paper, we first note that minimization of 0/1 loss is not an option, as it may yield an arbitrarily high rank loss. We show, however, that better results can be achieved by means of a weighted (cost-sensitive) version of 0/1 loss. Yet, the real gain is obtained through margin-based loss functions, for which we are able to derive proper bounds, not only for rank risk but, more importantly, also for rank regret. The paper is completed with an experimental study in which we address specific questions raised by our theoretical analysis

    F-Measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets

    No full text
    International audienceWe discuss a method to improve the exact F-measure max-imization algorithm called GFM, proposed in [2] for multi-label classification , assuming the label set can be partitioned into conditionally independent subsets given the input features. If the labels were all independent , the estimation of only m parameters (m denoting the number of labels) would suffice to derive Bayes-optimal predictions in O(m^2) operations [10]. In the general case, m^2 + 1 parameters are required by GFM, to solve the problem in O(m^3) operations. In this work, we show that the number of parameters can be reduced further to m^2 /n, in the best case, assuming the label set can be partitioned into n conditionally independent subsets. As this label partition needs to be estimated from the data beforehand, we use first the procedure proposed in [4] that finds such partition and then infer the required parameters locally in each label subset. The latter are aggregated and serve as input to GFM to form the Bayes-optimal prediction. We show on a synthetic experiment that the reduction in the number of parameters brings about significant benefits in terms of performance
    • …
    corecore