Search CORE

21 research outputs found

Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods

Author: Vluymans Sarah
Publication venue: Ghent University. Faculty of Medicine and Health Sciences ; University of Granada. Department of Computer Science and Artificial Intelligence
Publication date: 01/01/2018
Field of study

Fuzzy Rough Sets for Self-Labelling: an Exploratory Analysis

Author: Cornelis Chris
MacParthaláin Neil
Saeys Yvan
Vluymans Sarah
Publication venue
Publication date: 01/01/2016
Field of study

Semi-supervised learning incorporates aspects of both supervised and unsupervised learning. In semi-supervised classification, only some data instances have associated class labels, while others are unlabelled. One particular group of semi-supervised classification approaches are those known as self-labelling techniques, which attempt to assign class labels to the unlabelled data instances. This is achieved by using the class predictions based upon the information of the labelled part of the data. In this paper, the applicability and suitability of fuzzy rough set theory for the task of self-labelling is investigated. An important preparatory experimental study is presented that evaluates how accurately different fuzzy rough set models can predict the classes of unlabelled data instances for semi-supervised classification. The predictions are made either by considering only the labelled data instances or by involving the unlabelled data instances as well. A stability analysis of the predictions also helps to provide further insight into the characteristics of the different fuzzy rough models. Our study shows that the ordered weighted average based fuzzy rough model performs best in terms of both accuracy and stability. Our conclusions offer a solid foundation and rationale that will allow the construction of a fuzzy rough self-labelling technique. They also provide an understanding of the applicability of fuzzy rough sets for the task of semi-supervised classification in general

Aberystwyth Research Portal

Ghent University Academic Bibliography

Weight selection strategies for ordered weighted average based fuzzy rough sets

Author: Cornelis Chris
MacParthaláin Neil
Saeys Yvan
Vluymans Sarah
Publication venue
Publication date: 01/01/2019
Field of study

Aberystwyth Research Portal

Ghent University Academic Bibliography

Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition : a fuzzy rough set approach

Author: Cornelis Chris
Fernández Alberto
Herrera Francisco
Saeys Yvan
Vluymans Sarah
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Class imbalance occurs when data elements are unevenly distributed among classes, which poses a challenge for classifiers. The core focus of the research community has been on binary-class imbalance, although there is a recent trend toward the general case of multi-class imbalanced data. The IFROWANN method, a classifier based on fuzzy rough set theory, stands out for its performance in two-class imbalanced problems. In this paper, we consider its extension to multi-class data by combining it with one-versus-one decomposition. The latter transforms a multi-class problem into two-class sub-problems. Binary classifiers are applied to these sub-problems, after which their outcomes are aggregated into one prediction. We enhance the integration of IFROWANN in the decomposition scheme in two steps. Firstly, we propose an adaptive weight setting for the binary classifier, addressing the varying characteristics of the sub-problems. We call this modified classifier IFROWANN-WIR. Second, we develop a new dynamic aggregation method called WV–FROST that combines the predictions of the binary classifiers with the global class affinity before making a final decision. In a meticulous experimental study, we show that our complete proposal outperforms the state-of-the-art on a wide range of multi-class imbalanced datasets

Ghent University Academic Bibliography

Archivsystem Ask23

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

Author: Alcalá-Fdez
Alpaydin
Barua
Batista
Blaszczynski
Breiman
Cano
Castro
Chawla
Chris Cornelis
Cover
Das
Datta
Demšar
Díez-Pastor
Fawcett
Friedman
Galar
García
García
García
García
García-Pedrajas
Hand
He
Hido
Isaac Triguero
Khoshgoftaar
Kononenko
Krawczyk
Krawczyk
Kuncheva
Lee
Lin
López
López
Neri
Pawlak
Ramentol
Sarah Vluymans
Schapire
Seiffert
Storn
Ting
Triguero
Triguero
Triguero
Triguero
Wang
Wilson
Wilson
Yijing
Yu
Yule
Yvan Saeys
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It performs an evolutionary prototype reduction focused on providing diverse solutions to prevent the method from overfitting the training set. It also allows us to explicitly reduce the underrepresented class, which the most common preprocessing solutions handling class imbalance usually protect. As part of the experimental study, we show that the proposed prototype reduction method outperforms state-of-the-art preprocessing techniques. The preprocessing step yields multiple prototype sets that are later used in an ensemble, performing a weighted voting scheme with the nearest neighbor classifier. EPRENNID is experimentally shown to significantly outperform previous proposals

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Ghent University Academic Bibliography

Dealing with Imbalanced and Weakly Labeled Data in Machine Learning using Fuzzy Set and Rough Set Methods

Author: Vluymans Sarah
Publication venue: 'Editorial de la Universidad de Granada'
Publication date: 01/01/2018
Field of study

This thesis focuses on classification. The goal is to predict the class label of elements (that is, assign them to a category) based on a previously provided dataset of known observations. Traditionally, a number of features are measured for all observations, such that they can be described by a feature vector (collecting the values for all features) and an associated outcome, if the latter is known. In the classic iris dataset, for example, each observation corresponds to an iris plant and is described by its values for four features representing biological properties of the flower. The associated class label is the specific family of irises the sample belongs to and the prediction task is to categorize a plant to the correct family based on its feature values. A classification algorithm does so based on its training set of labelled instances, that is, a provided set of iris flowers for which both the features values and class labels are known. One of the most intuitive classifiers is the nearest neighbour algorithm. To classify a new element, this method locates the most similar training instance (the nearest neighbour) and assigns the target to the class to which this neighbour belongs. Other methods build an explicit classification model from the training set, for example in the format of a decision tree.Esta tesis se enfoca en el problema de la clasificación. El objetivo consiste en predecir las etiquetas de clase de determinados datos (es decir, asignarlos a una categoría), basándonos en un conjunto de datos, proporcionado previamente, que contiene observaciones conocidas. Tradicionalmente, se miden algunas características para todas las observaciones, de forma que estas ´ultimas se pueden describir por un vector de características (recopilando los valores para todas las características) y por un resultado asociado, a condición de que esté disponible. Por ejemplo, en el conjunto de datos clásico iris, cada observación corresponde a una planta de iris y está descrita por los valores de sus cuatro características representando propiedades biológicas de la flor. La etiqueta de clase asociada es la familia especifica de iris a la cual pertenece la muestra y la tarea de predicción consiste en asignar la planta a la familia correcta basándonos en los valores de sus características. Un algoritmo de clasificación efectúa esta tarea basándose en un conjunto de entrenamiento de instancias etiquetadas, es decir, un conjunto de flores de iris para las cuales se conocen tanto los valores de las características como las etiquetas de clase. Uno de los clasificadores más intuitivos es el algoritmo de vecinos más cercanos. Para clasificar un dato nuevo, este método localiza la instancia de entrenamiento más similar (el vecino más cercano) y lo asigna a la clase a la cual pertenece este vecino. Otros métodos construyen un modelo de clasificación explícito a partir del conjunto de entrenamiento, por ejemplo en forma de un árbol de decisión.Tesis Univ. Granada.Programa Oficial de Doctorado en Tecnologías de la Información y la ComunicaciónDit doctoraat kwam tot stand met steun van het Bijzonder Onderzoeksfonds van de Universiteit Gent. De buitenlandse verblijven aan de Universiteit van Granada (Spanje) werden gefinancierd door het Fonds voor Wetenschappelijk Onderzoek Vlaanderen. De experimenten in deze thesis werden deels uitgevoerd op de Hercules rekeninfrastructuur van de Universiteit van Granada

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada