Search CORE

274 research outputs found

Tree-structured multiclass probability estimators

Author: Leathart Timothy Matthew
Publication venue: The University of Waikato
Publication date: 10/09/2019
Field of study

Nested dichotomies are used as a method of transforming a multiclass classification problem into a series of binary problems. A binary tree structure is constructed over the label space that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. Several distinct nested dichotomy structures can be built in an ensemble for superior performance. In this thesis, we introduce two new methods for constructing more accurate nested dichotomies. Random-pair selection is a subset selection method that aims to group similar classes together in a non-deterministic fashion to easily enable the construction of accurate ensembles. Multiple subset evaluation takes this, and other subset selection methods, further by evaluating several different splits and choosing the best performing one. Finally, we also discuss the calibration of the probability estimates produced by nested dichotomies. We observe that nested dichotomies systematically produce under-confident predictions, even if the binary classifiers are well calibrated, and especially when the number of classes is high. Furthermore, substantial performance gains can be made when probability calibration methods are also applied to the internal models

Research Commons@Waikato

Ensembles of nested dichotomies with multiple subset evaluation

Author: A Beygelzimer
G Brier
HL Harter
J Demšar
J Fox
J Fürnkranz
J Royston
JJ Rodríguez
L Breiman
L Dong
LI Kuncheva
M Hall
M Meilă
MM Duarte-Villaseñor
R Rifkin
T Hastie
T Leathart
T Leathart
TG Dietterich
V Melnikov
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/09/2018
Field of study

A system of nested dichotomies (NDs) is a method of decomposing a multiclass problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Many methods have been proposed to perform this split, each with various advantages and disadvantages. In this paper, we present a simple, general method for improving the predictive performance of NDs produced by any subset selection techniques that employ randomness to construct the subsets. We provide a theoretical expectation for performance improvements, as well as empirical results showing that our method improves the root mean squared error of NDs, regardless of whether they are employed as an individual model or in an ensemble setting

arXiv.org e-Print Archive

Crossref

Research Commons@Waikato

Automated Machine Learning for Multi-Label Classification

Author: Wever Marcel
Publication venue
Publication date: 01/01/2021
Field of study

Open Access LMU

On calibration of nested dichotomies

Author: A Beygelzimer
A Kumar
AH Murphy
CC Chang
F Pedregosa
J Fox
J Platt
K Dembczyński
L Dong
O Russakovsky
P Mahé
R Rifkin
T Hastie
T Leathart
TG Dietterich
V Melnikov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Nested dichotomies (NDs) are used as a method of transforming a multiclass classification problem into a series of binary problems. A tree structure is induced that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. In this paper, we demonstrate that these NDs typically exhibit poor probability calibration, even when the binary base models are well-calibrated. We also show that this problem is exacerbated when the binary models are poorly calibrated. We discuss the effectiveness of different calibration strategies and show that accuracy and log-loss can be significantly improved by calibrating both the internal base models and the full ND structure, especially when the number of classes is high

Crossref

Research Commons@Waikato

A Comparative Study of Machine Learning Classifiers for Credit Card Fraud Detection

Author: Nur-E-Arefin Md.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 11/10/2021
Field of study

Now a day’s credit card transactions have been gaining popularity with the growth of e-commerce and shows tremendous opportunity for the future. Therefore, due to surge of credit card transaction, it is a crying need to secure it . Though the vendors and credit card providing authorities are showing dedication to secure the details of these transactions, researchers are searching new scopes or techniques to ensure absolute security which is the demand of time. To detect credit card fraud, along with other technologies, applications of machine learning and computational intelligence can be used and plays a vital role. For detecting credit card anomaly, this paper analyzes and compares some popular classifier algorithms. Moreover, this paper focuses on the performance of the classifiers. UCSD -FICO Data Mining Contest 2009 dataset were used to measure the performance of the classifiers. The final results of the experiment suggest that (1) meta and tree classifiers perform better than other types of classifiers, (2) though classification accuracy rate is high but fraud detection success rate is low. Finally, fraud detection rate  should be taken into consideration to assess the performance of the classifiers in a credit card fraud detection system

International Journal of Innovative Technology and Interdisciplinary Sciences (IJITIS)

Fuzzy Rules in Data Mining: From Fuzzy Associations to Gradual Dependencies

Author: Bonissone Piero P.
Hüllermeier Eyke
Kacprz Janusz
Magdalena Luis
Trillas Enric
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Open Access LMU

Proceedings. 27. Workshop Computational Intelligence, Dortmund, 23. - 24. November 2017

Author: Hoffmann Frank
Hüllermeier E.
Mikut Ralf
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2017
Field of study

Dieser Tagungsband enthält die Beiträge des 27. Workshops Computational Intelligence. Die Schwerpunkte sind Methoden, Anwendungen und Tools für Fuzzy-Systeme, Künstliche Neuronale Netze, Evolutionäre Algorithmen und Data-Mining-Verfahren sowie der Methodenvergleich anhand von industriellen und Benchmark-Problemen

KITopen

Open Access LMU

Learning error-correcting representations for multi-class problems

Author: Bautista Martín Miguel Ángel
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 01/01/2016
Field of study

[eng] Real life is full of multi-class decision tasks. In the Pattern Recognition ﬁeld, several method- ologies have been proposed to deal with binary problems obtaining satisfying results in terms of performance. However, the extension of very powerful binary classiﬁers to the multi-class case is a complex task. The Error-Correcting Output Codes framework has demonstrated to be a very powerful tool to combine binary classiﬁers to tackle multi-class problems. However, most of the combinations of binary classiﬁers in the ECOC framework overlook the underlay- ing structure of the multi-class problem. In addition, is still unclear how the Error-Correction of an ECOC design is distributed among the diﬀerent classes. In this dissertation, we are interested in tackling critic problems of the ECOC framework, such as the deﬁnition of the number of classiﬁers to tackle a multi-class problem, how to adapt the ECOC coding to multi-class data and how to distribute error-correction among diﬀerent pairs of categories. In order to deal with this issues, this dissertation describes several proposals. 1) We deﬁne a new representation for ECOC coding matrices that expresses the pair-wise codeword separability and allows for a deeper understanding of how error-correction is distributed among classes. 2) We study the eﬀect of using a logarithmic number of binary classiﬁers to treat the multi-class problem in order to obtain very eﬃcient models. 3) In order to search for very compact ECOC coding matrices that take into account the distribution of multi-class data we use Genetic Algorithms that take into account the constraints of the ECOC framework. 4) We propose a discrete factorization algorithm that ﬁnds an ECOC conﬁguration that allocates the error-correcting capabilities to those classes that are more prone to errors. The proposed methodologies are evaluated on diﬀerent real and synthetic data sets: UCI Machine Learning Repository, handwriting symbols, traﬃc signs from a Mobile Mapping System, and Human Pose Recovery. The results of this thesis show that signiﬁcant perfor- mance improvements are obtained on traditional coding ECOC designs when the proposed ECOC coding designs are taken into account. [[spa] En la vida cotidiana las tareas de decisión multi-clase surgen constantemente. En el campo de Reconocimiento de Patrones muchos métodos de clasificación binaria han sido propuestos obteniendo resultados altamente satisfactorios en términos de rendimiento. Sin embargo, la extensión de estos sofisticados clasificadores binarios al contexto multi-clase es una tarea compleja. En este ámbito, las estrategias de Códigos Correctores de Errores (CCEs) han demostrado ser una herramienta muy potente para tratar la combinación de clasificadores binarios. No obstante, la mayoría de arquitecturas de combinación de clasificadores binarios negligen la estructura del problema multi-clase. Sin embargo, el análisis de la distribución de corrección de errores entre clases es aún un problema abierto. En esta tesis doctoral, nos centramos en tratar problemas críticos de los códigos correctores de errores; la definición del número de clasificadores necesarios para tratar un problema multi-clase arbitrario; la adaptación de los problemas binarios al problema multi-clase y cómo distribuir la corrección de errores entre clases. Para dar respuesta a estas cuestiones, en esta tesis doctoral describimos varias propuestas. 1) Definimos una nueva representación para CCEs que expresa la separabilidad entre pares de códigos y nos permite una mejor comprensión de cómo se distribuye la corrección de errores entre distintas clases. 2) Estudiamos el efecto de usar un número logarítmico de clasificadores binarios para tratar el problema multi-clase con el objetivo de obtener modelos muy eficientes. 3) Con el objetivo de encontrar modelos muy eficientes que tienen en cuenta la estructura del problema multi-clase utilizamos algoritmos genéticos que tienen en cuenta las restricciones de los ECCs. 4) Pro- ponemos un algoritmo de factorización de matrices discreta que encuentra ECCs con una configuración que distribuye corrección de error a aquellas categorías que son más propensas a tener errores. Las metodologías propuestas son evaluadas en distintos problemas reales y sintéticos como por ejemplo: Repositorio UCI de Aprendizaje Automático, reconocimiento de símbolos escritos, clasificación de señales de tráfico y reconocimiento de la pose humana. Los resultados obtenidos en esta tesis muestran mejoras significativas en rendimiento comparados con los diseños tradiciones de ECCs cuando las distintas propuestas se tienen en cuenta

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Diposit Digital de la Universitat de Barcelona