Search CORE

253 research outputs found

Regret analysis for performance metrics in multi-label classification: the case of Hamming and subset zero-one loss

Author: B. Taskar
D. McAllester
D.J.C. MacKay
G. Tsoumakas
G. Tsoumakas
G. Tsoumakas
I.H. Witten
J.C. Platt
L. Breiman
M. Boutell
R. Caruana
R. Nelsen
W. Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Ghent University Academic Bibliography

Learning preferences for large scale multi-label problems

Author: CW Hsu
G Ou
G Tsoumakas
G Tsoumakas
J Allan
J Read
ML Zhang
S Vembu
Y Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Despite that the majority of machine learning approaches aim to solve binary classification problems, several real-world applications require specialized algorithms able to handle many different classes, as in the case of single-label multi-class and multi-label classification problems. The Label Ranking framework is a generalization of the above mentioned settings, which aims to map instances from the input space to a total order over the set of possible labels. However, generally these algorithms are more complex than binary ones, and their application on large-scale datasets could be untractable. The main contribution of this work is the proposal of a novel general online preference-based label ranking framework. The proposed framework is able to solve binary, multi-class, multi-label and ranking problems. A comparison with other baselines has been performed, showing effectiveness and efficiency in a real-world large-scale multi-label task

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

ZORA

Archivio istituzionale della ricerca - Università di Padova

Random forests with random projections of the output space for high dimensional multi-label classification

Author: D. Achlioptas
D. Kocev
E.J. Candes
F. Pedregosa
G. Madjarov
G. Tsoumakas
G. Tsoumakas
J. Read
J.L. Faulon
L. Breiman
P. Geurts
W.B. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège

Food Ingredients Recognition through Multi-label Learning

Author: G Tsoumakas
K Aizawa
L Bossard
O Russakovsky
W Shimoda
Publication venue
Publication date: 27/07/2017
Field of study

Automatically constructing a food diary that tracks the ingredients consumed can help people follow a healthy diet. We tackle the problem of food ingredients recognition as a multi-label learning problem. We propose a method for adapting a highly performing state of the art CNN in order to act as a multi-label predictor for learning recipes in terms of their list of ingredients. We prove that our model is able to, given a picture, predict its list of ingredients, even if the recipe corresponding to the picture has never been seen by the model. We make public two new datasets suitable for this purpose. Furthermore, we prove that a model trained with a high variability of recipes and ingredients is able to generalize better on new data, and visualize how it specializes each of its neurons to different ingredients.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Learning and Inference in Probabilistic Classifier Chains with Beam Search

Author: B. Schölkopf
G. King
G. Tsoumakas
G. Tsoumakas
J. Read
P. Hart
S. Russell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

A System for Multi-label Classification of Learning Objects

Author: A. Chiappe
A. Gil
D. Dagger
G. Tsoumakas
G. Tsoumakas
G. Tsoumakas
M. Boutell
M.L. Zhang
M.L. Zhang
P. Mika
R.E. Schapire
S. Diplaris
S. Ternier
Y. Yang
Publication venue: Springer Science + Business Media
Publication date: 01/01/2011
Field of study

The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of Learning Object (LO), which provides ubiquitous access to multiple and distributed educational resources in many repositories. This article presents a system that enables the recovery and classification of LO and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives. For this classification, it is used a special multi-label data mining designed for the LO ranking tasks. According to each position, the system is responsible for presenting the results to the end user. The learning process is supervised, using two major tasks in supervised learning from multi-label data: multi-label classification and label ranking

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Gestion del Repositorio Documental de la Universidad de Salamanca

On Aggregation in Ensembles of Multilabel Classifiers

Author: C Shi
C Shi
D Kocev
G Madjarov
G Tsoumakas
G Tsoumakas
J Read
J Read
JM Moyano
JR Quinlan
K Dembczyński
L Breiman
ML Zhang
N Li
SK Murthy
TG Dietterich
W Waegeman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/06/2020
Field of study

While a variety of ensemble methods for multilabel classification have been proposed in the literature, the question of how to aggregate the predictions of the individual members of the ensemble has received little attention so far. In this paper, we introduce a formal framework of ensemble multilabel classification, in which we distinguish two principal approaches: "predict then combine" (PTC), where the ensemble members first make loss minimizing predictions which are subsequently combined, and "combine then predict" (CTP), which first aggregates information such as marginal label probabilities from the individual ensemble members, and then derives a prediction from this aggregation. While both approaches generalize voting techniques commonly used for multilabel ensembles, they allow to explicitly take the target performance measure into account. Therefore, concrete instantiations of CTP and PTC can be tailored to concrete loss functions. Experimentally, we show that standard voting techniques are indeed outperformed by suitable instantiations of CTP and PTC, and provide some evidence that CTP performs well for decomposable loss functions, whereas PTC is the better choice for non-decomposable losses.Comment: 14 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Why do Sequence Signatures Predict Enzyme Mechanism?:Homology versus Chemistry

Author: IUBMB.
Nikolskaya A.N.
Tsoumakas G.
Witten I.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2015
Field of study

We identify, firstly, InterPro sequence signatures representing evolutionary relatedness and, secondly, signatures identifying specific chemical machinery. Thus, we predict the chemical mechanisms of enzyme catalysed reactions from “catalytic” and “non-catalytic” subsets of InterPro signatures. We first scanned our 249 sequences with InterProScan and then used the MACiE database to identify those amino acid residues which are important for catalysis. The sequences were mutated in silico to replace these catalytic residues with glycine, and then again scanned with InterProScan. Those signature matches from the original scan which disappeared on mutation were called “catalytic”. Mechanism was predicted using all signatures, only the 78 “catalytic” signatures, or only the 519 “non-catalytic” signatures. The noncatalytic signatures gave results indistinguishable from those for the whole feature set, with precision of 0.991 and sensitivity of 0.970. The catalytic signatures alone gave less impressive predictivity, with precision and sensitivity of 0.791 and 0.735, respectively. These results show that our successful prediction of enzyme mechanism is mostly by homology rather than by identifying catalytic machinery.Publisher PDFPeer reviewe

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

University of St. Andrews - Pure

St Andrews Research Repository

Binary relevance efficacy for multilabel classification

Author: C Bielza
G Madjarov
G Tsoumakas
G Tsoumakas
J Read
JR Quevedo
ML Zhang
R Schapire
W Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules

Author: D Malerba
E Loza Mencía
F Charte
F Thabtah
G Bosc
G Tsoumakas
J Demšar
JL Ávila-Jiménez
K Dembczyński
M Allamanis
U Kayande
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2018
Field of study

Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into account for each rule grows exponentially with the number of available labels. To overcome this limitation, algorithms for exhaustive rule mining typically use properties such as anti-monotonicity or decomposability in order to prune the search space. In the present paper, we examine whether commonly used multi-label evaluation metrics satisfy these properties and therefore are suited to prune the search space for multi-label heads.Comment: Preprint version. To appear in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2018. See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3074 for further information. arXiv admin note: text overlap with arXiv:1812.0005

arXiv.org e-Print Archive

Crossref