253 research outputs found
Learning preferences for large scale multi-label problems
Despite that the majority of machine learning approaches aim to solve binary classification problems, several real-world applications require specialized algorithms able to handle many different classes, as in the case of single-label multi-class and multi-label classification problems. The Label Ranking framework is a generalization of the above mentioned settings, which aims to map instances from the input space to a total order over the set of possible labels. However, generally these algorithms are more complex than binary ones, and their application on large-scale datasets could be untractable. The main contribution of this work is the proposal of a novel general online preference-based label ranking framework. The proposed framework is able to solve binary, multi-class, multi-label and ranking problems. A comparison with other baselines has been performed, showing effectiveness and efficiency in a real-world large-scale multi-label task
Random forests with random projections of the output space for high dimensional multi-label classification
We adapt the idea of random projections applied to the output space, so as to
enhance tree-based ensemble methods in the context of multi-label
classification. We show how learning time complexity can be reduced without
affecting computational complexity and accuracy of predictions. We also show
that random output space projections may be used in order to reach different
bias-variance tradeoffs, over a broad panel of benchmark problems, and that
this may lead to improved accuracy while reducing significantly the
computational burden of the learning stage
Food Ingredients Recognition through Multi-label Learning
Automatically constructing a food diary that tracks the ingredients consumed
can help people follow a healthy diet. We tackle the problem of food
ingredients recognition as a multi-label learning problem. We propose a method
for adapting a highly performing state of the art CNN in order to act as a
multi-label predictor for learning recipes in terms of their list of
ingredients. We prove that our model is able to, given a picture, predict its
list of ingredients, even if the recipe corresponding to the picture has never
been seen by the model. We make public two new datasets suitable for this
purpose. Furthermore, we prove that a model trained with a high variability of
recipes and ingredients is able to generalize better on new data, and visualize
how it specializes each of its neurons to different ingredients.Comment: 8 page
A System for Multi-label Classification of Learning Objects
The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of Learning Object (LO), which provides ubiquitous access to multiple and distributed educational resources in many repositories. This article presents a system that enables the recovery and classification of LO and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives. For this classification, it is used a special multi-label data mining designed for the LO ranking tasks. According to each position, the system is responsible for presenting the results to the end user. The learning process is supervised, using two major tasks in supervised learning from multi-label data: multi-label classification and label ranking
On Aggregation in Ensembles of Multilabel Classifiers
While a variety of ensemble methods for multilabel classification have been
proposed in the literature, the question of how to aggregate the predictions of
the individual members of the ensemble has received little attention so far. In
this paper, we introduce a formal framework of ensemble multilabel
classification, in which we distinguish two principal approaches: "predict then
combine" (PTC), where the ensemble members first make loss minimizing
predictions which are subsequently combined, and "combine then predict" (CTP),
which first aggregates information such as marginal label probabilities from
the individual ensemble members, and then derives a prediction from this
aggregation. While both approaches generalize voting techniques commonly used
for multilabel ensembles, they allow to explicitly take the target performance
measure into account. Therefore, concrete instantiations of CTP and PTC can be
tailored to concrete loss functions. Experimentally, we show that standard
voting techniques are indeed outperformed by suitable instantiations of CTP and
PTC, and provide some evidence that CTP performs well for decomposable loss
functions, whereas PTC is the better choice for non-decomposable losses.Comment: 14 pages, 2 figure
Why do Sequence Signatures Predict Enzyme Mechanism?:Homology versus Chemistry
We identify, firstly, InterPro sequence signatures representing evolutionary relatedness and, secondly, signatures identifying specific chemical machinery. Thus, we predict the chemical mechanisms of enzyme catalysed reactions from “catalytic” and “non-catalytic” subsets of InterPro signatures. We first scanned our 249 sequences with InterProScan and then used the MACiE database to identify those amino acid residues which are important for catalysis. The sequences were mutated in silico to replace these catalytic residues with glycine, and then again scanned with InterProScan. Those signature matches from the original scan which disappeared on mutation were called “catalytic”. Mechanism was predicted using all signatures, only the 78 “catalytic” signatures, or only the 519 “non-catalytic” signatures. The noncatalytic signatures gave results indistinguishable from those for the whole feature set, with precision of 0.991 and sensitivity of 0.970. The catalytic signatures alone gave less impressive predictivity, with precision and sensitivity of 0.791 and 0.735, respectively. These results show that our successful prediction of enzyme mechanism is mostly by homology rather than by identifying catalytic machinery.Publisher PDFPeer reviewe
Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules
Exploiting dependencies between labels is considered to be crucial for
multi-label classification. Rules are able to expose label dependencies such as
implications, subsumptions or exclusions in a human-comprehensible and
interpretable manner. However, the induction of rules with multiple labels in
the head is particularly challenging, as the number of label combinations which
must be taken into account for each rule grows exponentially with the number of
available labels. To overcome this limitation, algorithms for exhaustive rule
mining typically use properties such as anti-monotonicity or decomposability in
order to prune the search space. In the present paper, we examine whether
commonly used multi-label evaluation metrics satisfy these properties and
therefore are suited to prune the search space for multi-label heads.Comment: Preprint version. To appear in: Proceedings of the Pacific-Asia
Conference on Knowledge Discovery and Data Mining (PAKDD) 2018. See
http://www.ke.tu-darmstadt.de/bibtex/publications/show/3074 for further
information. arXiv admin note: text overlap with arXiv:1812.0005
- …