21 research outputs found

    Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods

    Get PDF

    Fuzzy Rough Sets for Self-Labelling: an Exploratory Analysis

    Get PDF
    Semi-supervised learning incorporates aspects of both supervised and unsupervised learning. In semi-supervised classification, only some data instances have associated class labels, while others are unlabelled. One particular group of semi-supervised classification approaches are those known as self-labelling techniques, which attempt to assign class labels to the unlabelled data instances. This is achieved by using the class predictions based upon the information of the labelled part of the data. In this paper, the applicability and suitability of fuzzy rough set theory for the task of self-labelling is investigated. An important preparatory experimental study is presented that evaluates how accurately different fuzzy rough set models can predict the classes of unlabelled data instances for semi-supervised classification. The predictions are made either by considering only the labelled data instances or by involving the unlabelled data instances as well. A stability analysis of the predictions also helps to provide further insight into the characteristics of the different fuzzy rough models. Our study shows that the ordered weighted average based fuzzy rough model performs best in terms of both accuracy and stability. Our conclusions offer a solid foundation and rationale that will allow the construction of a fuzzy rough self-labelling technique. They also provide an understanding of the applicability of fuzzy rough sets for the task of semi-supervised classification in general

    Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition : a fuzzy rough set approach

    Get PDF
    Class imbalance occurs when data elements are unevenly distributed among classes, which poses a challenge for classifiers. The core focus of the research community has been on binary-class imbalance, although there is a recent trend toward the general case of multi-class imbalanced data. The IFROWANN method, a classifier based on fuzzy rough set theory, stands out for its performance in two-class imbalanced problems. In this paper, we consider its extension to multi-class data by combining it with one-versus-one decomposition. The latter transforms a multi-class problem into two-class sub-problems. Binary classifiers are applied to these sub-problems, after which their outcomes are aggregated into one prediction. We enhance the integration of IFROWANN in the decomposition scheme in two steps. Firstly, we propose an adaptive weight setting for the binary classifier, addressing the varying characteristics of the sub-problems. We call this modified classifier IFROWANN-WIR. Second, we develop a new dynamic aggregation method called WV–FROST that combines the predictions of the binary classifiers with the global class affinity before making a final decision. In a meticulous experimental study, we show that our complete proposal outperforms the state-of-the-art on a wide range of multi-class imbalanced datasets

    EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

    Get PDF
    Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It performs an evolutionary prototype reduction focused on providing diverse solutions to prevent the method from overfitting the training set. It also allows us to explicitly reduce the underrepresented class, which the most common preprocessing solutions handling class imbalance usually protect. As part of the experimental study, we show that the proposed prototype reduction method outperforms state-of-the-art preprocessing techniques. The preprocessing step yields multiple prototype sets that are later used in an ensemble, performing a weighted voting scheme with the nearest neighbor classifier. EPRENNID is experimentally shown to significantly outperform previous proposals

    Dealing with Imbalanced and Weakly Labeled Data in Machine Learning using Fuzzy Set and Rough Set Methods

    Get PDF
    This thesis focuses on classification. The goal is to predict the class label of elements (that is, assign them to a category) based on a previously provided dataset of known observations. Traditionally, a number of features are measured for all observations, such that they can be described by a feature vector (collecting the values for all features) and an associated outcome, if the latter is known. In the classic iris dataset, for example, each observation corresponds to an iris plant and is described by its values for four features representing biological properties of the flower. The associated class label is the specific family of irises the sample belongs to and the prediction task is to categorize a plant to the correct family based on its feature values. A classification algorithm does so based on its training set of labelled instances, that is, a provided set of iris flowers for which both the features values and class labels are known. One of the most intuitive classifiers is the nearest neighbour algorithm. To classify a new element, this method locates the most similar training instance (the nearest neighbour) and assigns the target to the class to which this neighbour belongs. Other methods build an explicit classification model from the training set, for example in the format of a decision tree.Esta tesis se enfoca en el problema de la clasificación. El objetivo consiste en predecir las etiquetas de clase de determinados datos (es decir, asignarlos a una categoría), basándonos en un conjunto de datos, proporcionado previamente, que contiene observaciones conocidas. Tradicionalmente, se miden algunas características para todas las observaciones, de forma que estas ´ultimas se pueden describir por un vector de características (recopilando los valores para todas las características) y por un resultado asociado, a condición de que esté disponible. Por ejemplo, en el conjunto de datos clásico iris, cada observación corresponde a una planta de iris y está descrita por los valores de sus cuatro características representando propiedades biológicas de la flor. La etiqueta de clase asociada es la familia especifica de iris a la cual pertenece la muestra y la tarea de predicción consiste en asignar la planta a la familia correcta basándonos en los valores de sus características. Un algoritmo de clasificación efectúa esta tarea basándose en un conjunto de entrenamiento de instancias etiquetadas, es decir, un conjunto de flores de iris para las cuales se conocen tanto los valores de las características como las etiquetas de clase. Uno de los clasificadores más intuitivos es el algoritmo de vecinos más cercanos. Para clasificar un dato nuevo, este método localiza la instancia de entrenamiento más similar (el vecino más cercano) y lo asigna a la clase a la cual pertenece este vecino. Otros métodos construyen un modelo de clasificación explícito a partir del conjunto de entrenamiento, por ejemplo en forma de un árbol de decisión.Tesis Univ. Granada.Programa Oficial de Doctorado en Tecnologías de la Información y la ComunicaciónDit doctoraat kwam tot stand met steun van het Bijzonder Onderzoeksfonds van de Universiteit Gent. De buitenlandse verblijven aan de Universiteit van Granada (Spanje) werden gefinancierd door het Fonds voor Wetenschappelijk Onderzoek Vlaanderen. De experimenten in deze thesis werden deels uitgevoerd op de Hercules rekeninfrastructuur van de Universiteit van Granada
    corecore