Search CORE

66 research outputs found

Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

Author: Androutsopoulos Ion
Gaussier Eric
Kosmopoulos Aris
Paliouras Georgios
Partalas Ioannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2013
Field of study

Hierarchical classification addresses the problem of classifying items into a hierarchy of classes. An important issue in hierarchical classification is the evaluation of different classification algorithms, which is complicated by the hierarchical relations among the classes. Several evaluation measures have been proposed for hierarchical classification using the hierarchy in different ways. This paper studies the problem of evaluation in hierarchical classification by analyzing and abstracting the key components of the existing performance measures. It also proposes two alternative generic views of hierarchical evaluation and introduces two corresponding novel measures. The proposed measures, along with the state-of-the art ones, are empirically tested on three large datasets from the domain of text classification. The empirical results illustrate the undesirable behavior of existing approaches and how the proposed methods overcome most of these methods across a range of cases.Comment: Submitted to journa

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

LSHTC: A Benchmark for Large-Scale Text Classification

Author: Amini Massih-Reza
Androutsopoulos Ion
Artieres Thierry
Baskiotis Nicolas
Galinari Patrick
Gaussier Eric
Kosmopoulos Aris
Paliouras George
Partalas Ioannis
Publication venue
Publication date: 15/03/2015
Field of study

LSHTC is a series of challenges which aims to assess the performance of classification systems in large-scale classification in a a large number of classes (up to hundreds of thousands). This paper describes the dataset that have been released along the LSHTC series. The paper details the construction of the datsets and the design of the tracks as well as the evaluation measures that we implemented and a quick overview of the results. All of these datasets are available online and runs may still be submitted on the online server of the challenges

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

HAL AMU

Maximum-Margin Framework for Training Data Synchronization in Large-Scale Hierarchical Classification

Author: Amini Massih-Reza
Babbar Rohit
Gaussier Eric
Partalas Ioannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceIn the context of supervised learning, the training data for large-scale hierarchical classification consist of (i) a set of input-output pairs, and (ii) a hierarchy structure defining parent-child relation among class labels. It is often the case that the hierarchy structure given a-priori is not optimal for achieving high classification accuracy. This is especially true for web-taxonomies such as Yahoo! directory which consist of tens of thousand of classes. Furthermore, an important goal of hierarchy design is to render better navigability and browsing. In this work, we propose a maximum-margin framework for automatically adapting the given hierarchy by using the set of input-output pairs to yield a new hierarchy. The proposed method is not only theoretically justified but also provides a more principled approach for hierarchy flattening techniques proposed earlier, which are ad-hoc and empirical in nature. The empirical results on publicly available large-scale datasets demonstrate that classification with new hierarchy leads to better or comparable generalization performance than the hierarchy flattening techniques

Crossref

Hal - Université Grenoble Alpes

Learning Taxonomy Adaptation in Large-scale Classification

Author: Amblard Cécile
Amini Massih-Reza
Babbar Rohit
Eric Gaussier
Partalas Ioannis
Publication venue: Microtome Publishing
Publication date: 01/05/2016
Field of study

International audienc

Hal - Université Grenoble Alpes

Greedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problems

Author: Arnott
Bergstrom
Bhat
Box
Breiman
Brown
Brown
Cai
Cherkasov
Dudek
Fang
Fühner
Galton
Gedeck
Glick
Golbraikh
Helfert
Hermundstad
Hopfinger
Hughes
Jorgensen
Karatzoglou
Karthikeyan
Kubat
Kuhn
Kvålseth
Liaw
Lin
Lipinski
Lipinski
Liu
Llinàs
Lusci
Mardia
McDonagh
Mitchell
Muggleton
Nantasenamat
Needham
Nigsch
Noble
Oshiro
Palmer
Pao
Partalas
Ran
Rasmussen
Schroeter
Schwaighofer
Sebastiani
Spatola
Surowiecki
Svetnik
Team
Tesauro
Tipping
Tropsha
Walton
Williams
Williams
Yang
Publication venue: 'Wiley'
Publication date: 01/09/2015
Field of study

The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. Thinvestigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the ‘wisdom of crowds’ principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data pre-processing methodology was found to be crucial to performance of each method too.PostprintPeer reviewe

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Reinforcement learning in agent systems

Author: Partalas Ioannis
Παρτάλας Ιωάννης
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2009
Field of study

This dissertation pertains to the area of Machine Learning and especially to the subfield of Reinforcement Learning. Reinforcement learning comprises an appealing solution to problems with limited environmental feedback. The reinforcement learning framework provides the appropriate tools for solving complex problems, unlike other machine learning frameworks where correct labeled examples are necessary. For example, it is possible that the environment that an autonomous agent will act, may be unknown. Despite the research efforts and the successes in reinforcement learning, several research topics are still open. The contribution of this thesis is two-fold: a) it concerns the deployment of methods in multiagent systems and b) it uses the reinforcement learning framework to solve complex problems like focused crawling and ensemble pruning. Firstly, the thesis concentrates on the problem of coordinating a group of autonomous agents in order to achieve a common goal. For dealing with this problem, a reinforcement learning method is presented that is based on learning a set of strategies. Using a fusion procedure the individual decisions are combined, in order to follow a common strategy. Additionally, a method is presented for accelerating the learning procedure in reinforcement learning agents that can be applied in both multiagent and single agent systems. The proposed method uses multiple mapping functions, in order to transfer knowledge from a source task to a more complicated target task. Furthermore, an approach for focused crawling is proposed, which is based on the reinforcement learning framework. More specifically, the proposed approach learns a policy of selecting an appropriate classifier for link scoring during the crawling process. This is an indirect way to face the problem of focused crawling, without requiring the direct learning of selecting a link to follow or not. Finally, the problem of pruning an ensemble of classifiers is modeled as a reinforcement learning problem. The agent learns a policy of inserting or not a classifier in the ensemble. The proposed approach uses a general definition of the reward function in order to stress the capability of using different instantiations of the performance metric, the performance evaluation method depending on the requirements of the domain or the preferences of the data analyst. Additionally, a taxonomy of the existing ensemble pruning methods is proposed.Η παρούσα διατριβή εντάσσεται στον χώρο της μηχανικής μάθησης και ιδιαίτερα την υπο- περιοχή της ενισχυτικής μάθησης. Η ενισχυτική μάθηση αποτελεί μια βασική προσέγγιση εκπαίδευσης αυτόνομων πρακτόρων οι οποίοι λαμβάνουν περιορισμένη πληροφορία για την εκπαίδευσή τους από το περιβάλλον. Το πλαίσιο της ενισχυτική μάθησης παρέχει τη δυνατό- τητα λύσης πολύπλοκων προβλημάτων χωρίς την προϋπόθεση εξωτερικής επίβλεψης που συναντάται σε άλλες οικογένειες μεθόδων μηχανικής μάθησης (π.χ. εκμάθηση ταξινομη- τών). Αυτό είναι σημαντικό γιατί το το περιβάλλον στο οποίο πρόκειται να δραστηριοποι- ηθεί ένα αυτόνομος πράκτορας συχνά δεν είναι γνωστό εκ των προτέρων. Παρά τη μεγάλη ερευνητική δραστηριότητα και τις επιτυχίες σε διάφορα πεδία, πολλά ερευνητικά θέματα παραμένουν ανοιχτά στον χώρο της ενισχυτικής μάθησης. Η παρούσα διατριβή κινείται σε δύο άξονες: α) στην ανάπτυξη μεθόδων για συστήματα πολλαπλών πρακτόρων και β) στην μοντελοποίηση πολύπλοκων προβλημάτων μέσω της ενισχυτικής μάθησης. Αρχικά, η διατριβή επικεντρώνεται στο πρόβλημα του συντονισμού μιας ομάδας πρα- κτόρων για την επίτευξη ενός κοινού στόχου. Για την αντιμετώπιση του προβλήματος αυ- τού προτείνεται μία μέθοδος ενισχυτικής μάθησης που βασίζεται στη μάθηση ενός συνόλου στρατηγικών. Μέσω μιας διαδικασίας συγχώνευσης των επιμέρους αποφάσεων, οι πράκτο- ρες επιλέγουν από κοινού τη στρατηγική που θα ακολουθήσουν. Επίσης, προτείνεται μία μέθοδος για την επιτάχυνση της διαδικασία της ενισχυτικής μά- θησης σε πράκτορες η οποία μπορεί να εφαρμοστεί τόσο σε συστήματα όπου δρουν πολλα- πλοί πράκτορες, όσο και σε συστήματα όπου ενεργεί ένας μόνο πράκτορας. Η προτεινόμενη μέθοδος χρησιμοποιεί πολλαπλές συναρτήσεις απεικόνισης για να μεταφέρει τη γνώση από μια πηγαία εργασία σε μια δυσκολότερη εργασία στόχο. Επιπλέον, προτείνεται μία μέθοδος που συνδυάζει την ενισχυτική μάθηση με την επι- λογή ταξινομητών από μια ομάδα. Πιο συγκεκριμένα, η προτεινόμενη προσέγγιση έχει ως σκοπό τη μάθηση μιας βέλτιστης συμπεριφοράς επιλογής του κατάλληλου ταξινομητή για τη βαθμολόγηση των συνδέσμων κατά τη διαδικασία της εστιασμένης περιήγησης στον Παγκόσμιο Ιστό. Με αυτό τον συνδυασμό ενισχυτικής μάθησης και ομάδας ταξινομητών επιτυγχάνεται η εκπαίδευση ενός πράκτορα σ’ ένα αχανές και μεταβαλλόμενο περιβάλλον όπως αυτό του Ιστού. Επεκτείνοντας αυτή τη φιλοσοφία, προτείνεται μια μέθοδος ενισχυτικής μάθησης για το κλάδεμα μιας ομάδας ταξινομητών με σκοπό τη βελτίωση της απόδοσης πρόβλεψης. Ο πράκτορας μαθαίνει μία πολιτική εισαγωγής ή όχι των ταξινομητών στην ομάδα. Η μοντε- λοποίηση που προτείνεται χρησιμοποιεί έναν γενικό ορισμό της ανταμοιβής έτσι ώστε να παρέχεται η δυνατότητα της χρήσης διαφορετικών μετρικών της απόδοσης ανάλογα με τις απαιτήσεις του πεδίου. Επίσης, προτείνεται και μία ταξινόμηση των μεθόδων κλαδέματος ταξινομητών που έχουν παρουσιαστεί στη βιβλιογραφία

Hellenic National Archive of Doctoral Dissertations