109 research outputs found

    Rails Quality Data Modelling via Machine Learning-Based Paradigms

    Get PDF

    Three-way Imbalanced Learning based on Fuzzy Twin SVM

    Full text link
    Three-way decision (3WD) is a powerful tool for granular computing to deal with uncertain data, commonly used in information systems, decision-making, and medical care. Three-way decision gets much research in traditional rough set models. However, three-way decision is rarely combined with the currently popular field of machine learning to expand its research. In this paper, three-way decision is connected with SVM, a standard binary classification model in machine learning, for solving imbalanced classification problems that SVM needs to improve. A new three-way fuzzy membership function and a new fuzzy twin support vector machine with three-way membership (TWFTSVM) are proposed. The new three-way fuzzy membership function is defined to increase the certainty of uncertain data in both input space and feature space, which assigns higher fuzzy membership to minority samples compared with majority samples. To evaluate the effectiveness of the proposed model, comparative experiments are designed for forty-seven different datasets with varying imbalance ratios. In addition, datasets with different imbalance ratios are derived from the same dataset to further assess the proposed model's performance. The results show that the proposed model significantly outperforms other traditional SVM-based methods

    A review on classification of imbalanced data for wireless sensor networks

    Get PDF
    © The Author(s) 2020. Classification of imbalanced data is a vastly explored issue of the last and present decade and still keeps the same importance because data are an essential term today and it becomes crucial when data are distributed into several classes. The term imbalance refers to uneven distribution of data into classes that severely affects the performance of traditional classifiers, that is, classifiers become biased toward the class having larger amount of data. The data generated from wireless sensor networks will have several imbalances. This review article is a decent analysis of imbalance issue for wireless sensor networks and other application domains, which will help the community to understand WHAT, WHY, and WHEN of imbalance in data and its remedies

    Least squares minimum class variance support vector machines

    Get PDF
    In this paper, we propose a Support Vector Machine (SVM)-type algorithm, which is statistically faster among other common algorithms in the family of SVM algorithms. The new algorithm uses distributional information of each class and, therefore, combines the benefits of using the class variance in the optimization with the least squares approach, which gives an analytic solution to the minimization problem and, therefore, is computationally efficient. We demonstrate an important property of the algorithm which allows us to address the inversion of a singular matrix in the solution. We also demonstrate through real data experiments that we improve on the computational time without losing any of the accuracy when compared to previously proposed algorithms

    Multiclass Posterior Probability Twin SVM for Motor Imagery EEG Classification

    Get PDF
    Motor imagery electroencephalography is widely used in the brain-computer interface systems. Due to inherent characteristics of electroencephalography signals, accurate and real-time multiclass classification is always challenging. In order to solve this problem, a multiclass posterior probability solution for twin SVM is proposed by the ranking continuous output and pairwise coupling in this paper. First, two-class posterior probability model is constructed to approximate the posterior probability by the ranking continuous output techniques and Platt's estimating method. Secondly, a solution of multiclass probabilistic outputs for twin SVM is provided by combining every pair of class probabilities according to the method of pairwise coupling. Finally, the proposed method is compared with multiclass SVM and twin SVM via voting, and multiclass posterior probability SVM using different coupling approaches. The efficacy on the classification accuracy and time complexity of the proposed method has been demonstrated by both the UCI benchmark datasets and real world EEG data from BCI Competition IV Dataset 2a, respectively

    A Structural SVM Based Approach for Binary Classification under Class Imbalance

    Get PDF
    Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is not suitable in such settings. A wide range of performance measures such as AM and QM have been proposed for this problem. However, due to computational difficulties, few learning techniques have been developed to directly optimize for AM or QM metric. To fill the gap, in this paper, we present a general structural SVM framework for directly optimizing AM and QM. We define the loss functions oriented to AM and QM, respectively, and adopt the cutting plane algorithm to solve the outer optimization. For the inner problem of finding the most violated constraint, we propose two efficient algorithms for the AM and QM problem. Empirical studies on the various imbalanced datasets justify the effectiveness of the proposed approach

    Clasificador de máquinas de vectores de soporte para problemas desbalanceados con selección automática de parámetros

    Get PDF
    La mayoría de los métodos de clasificación asumen que el número de muestras en las clases estudiadas son las mismas (balanceadas). Sin embargo, realizar esta asunción puede llevar a desempeños sesgados, ya que, la mayoría de aplicaciones y bases de datos reales no son balanceadas, llevando a que estos métodos ignoren la clase minoritaria (la clase con el menor número de muestras). Este trabajo propone un clasificador novedoso, llamado enhanced twin support vector machine–(ETWSVM), que representa las muestras de entrada en un espacio de características de alta dimensionalidad, posiblemente infinita, durante la construcción de una frontera de decisión bajo la filosofía del twin support vector machine–(TWSVM). También, usamos un método basado en centered kernel alignment–(CKA) para aprender la función kernel con el fin de contrarrestar los problemas inherentes del desbalance y mejorar la separabilidad de los datos. Además, adoptamos las estrategias One-versus-Rest y One-versus-One para extender la formulación del ETWSVM a tareas de clasificación multiclase. De los resultados obtenidos sobre bases de datos sintéticas y reales, nuestra propuesta supera métodos del estado del arte con respecto al desempeño (precisión, media geométrica, F-measure), y tiempo de entrenamiento. En efecto, después analizamos la sensibilidad de los parámetros libres para diferentes tasas de desbalance y traslape entre las clases, y sugerimos una variante del ETWSVMN automático que registra una indicada relación entre desempeño de clasificación y tiempo de entrenamiento

    Clasificador de máquinas de vectores de soporte para problemas desbalanceados con selección automática de parámetros

    Get PDF
    La mayoría de los métodos de clasificación asumen que el número de muestras en las clases estudiadas son las mismas (balanceadas). Sin embargo, realizar esta asunción puede llevar a desempeños sesgados, ya que, la mayoría de aplicaciones y bases de datos reales no son balanceadas, llevando a que estos métodos ignoren la clase minoritaria (la clase con el menor número de muestras). Este trabajo propone un clasificador novedoso, llamado enhanced twin support vector machine–(ETWSVM), que representa las muestras de entrada en un espacio de características de alta dimensionalidad, posiblemente infinita, durante la construcción de una frontera de decisión bajo la filosofía del twin support vector machine–(TWSVM). También, usamos un método basado en centered kernel alignment–(CKA) para aprender la función kernel con el fin de contrarrestar los problemas inherentes del desbalance y mejorar la separabilidad de los datos. Además, adoptamos las estrategias One-versus-Rest y One-versus-One para extender la formulación del ETWSVM a tareas de clasificación multiclase. De los resultados obtenidos sobre bases de datos sintéticas y reales, nuestra propuesta supera métodos del estado del arte con respecto al desempeño (precisión, media geométrica, F-measure), y tiempo de entrenamiento. En efecto, después analizamos la sensibilidad de los parámetros libres para diferentes tasas de desbalance y traslape entre las clases, y sugerimos una variante del ETWSVMN automático que registra una indicada relación entre desempeño de clasificación y tiempo de entrenamiento
    corecore