Search CORE

6,280 research outputs found

Coupling different methods for overcoming the class imbalance problem

Author: Fantozzi Carlo
N. Lazzarini
Nanni Loris
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Many classification problems must deal with imbalanced datasets where one class \u2013 the majority class \u2013 outnumbers the other classes. Standard classification methods do not provide accurate predictions in this setting since classification is generally biased towards the majority class. The minority classes are oftentimes the ones of interest (e.g., when they are associated with pathological conditions in patients), so methods for handling imbalanced datasets are critical. Using several different datasets, this paper evaluates the performance of state-of-the-art classification methods for handling the imbalance problem in both binary and multi-class datasets. Different strategies are considered, including the one-class and dimension reduction approaches, as well as their fusions. Moreover, some ensembles of classifiers are tested, in addition to stand-alone classifiers, to assess the effectiveness of ensembles in the presence of imbalance. Finally, a novel ensemble of ensembles is designed specifically to tackle the problem of class imbalance: the proposed ensemble does not need to be tuned separately for each dataset and outperforms all the other tested approaches. To validate our classifiers we resort to the KEEL-dataset repository, whose data partitions (training/test) are publicly available and have already been used in the open literature: as a consequence, it is possible to report a fair comparison among different approaches in the literature. Our best approach (MATLAB code and datasets not easily accessible elsewhere) will be available at https://www.dei.unipd.it/node/2357

Crossref

Newcastle University E-Prints

Archivio istituzionale della ricerca - Università di Padova

HAR-MI method for multi-class imbalanced datasets

Author: Abdullah Dahlan
Hartono H.
Ongko Erianto
Risyani Yeni
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/04/2020
Field of study

Research on multi-class imbalance from a number of researchers faces obstacles in the form of poor data diversity and a large number of classifiers. The Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method is a Hybrid Ensembles method which is the development of the Hybrid Approach Redefinion (HAR) method. This study has compared the results obtained with the Dynamic Ensemble Selection-Multiclass Imbalance (DES-MI) method in handling multiclass imbalance. In the HAR-MI Method, the preprocessing stage was carried out using the random balance ensembles method and dynamic ensemble selection to produce a candidate ensemble and the processing stages was carried out using different contribution sampling and dynamic ensemble selection to produce a candidate ensemble. This research has been conducted by using multi-class imbalance datasets sourced from the KEEL Repository. The results show that the HAR-MI method can overcome multi-class imbalance with better data diversity, smaller number of classifiers, and better classifier performance compared to a DES-MI method. These results were tested with a Wilcoxon signed-rank statistical test which showed that the superiority of the HAR-MI method with respect to DES-MI method

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

On the relevance of preprocessing in predictive maintenance for dynamic systems

Author: A Chuang
A Graves
A Savitzky
AJ Smola
AP Bradley
B Schölkopf
B Schölkopf
BS Yang
BW Silverman
C Cernuda
C Cernuda
C Cernuda
C Cernuda
C Phua
C Wang
Carlos Cernuda
CE Shannon
D Cabrera
D Freedman
D Li
D Lin
D Wolpert
D Wu
DB Rubin
DL Wilson
E Lughofer
F Fleuret
F Serdio
F Serdio
F Serdio
G Brown
G Qiu
G Weiss
GEAPA Batista
GEP Box
H Peng
H Yang
H Zou
HB Mann
HJ Weaver
I Daubechies
I Guyon
I Guyon
I Jolliffe
I Tomek
J Gerretzen
J Ville
JB Tenenbaum
Jorma Laurikkala
K Greff
K Tschumitschew
K Varmuza
KV Branden
L Breiman
L Breiman
L Maaten
L Tan
L Zhang
M Bartlett
M Frigo
M Hubert
M Jung
M Li
MA Oliveira
MR Smith
N Friedman
N Kwak
NE Huang
NV Chawla
NV Chawla
O Troyanskaya
P Duhamel
P Mahalanobis
P Welch
PE Hart
R Battiti
R Kohavi
R Nikzad-Langerodi
R Nunkesser
R Tibshirani
RC Sharpley
RD Maesschalck
RM Sakia
RN Bracewell
S García
S Gelper
S Hochreiter
S Kadambe
S Oba
S Roweis
SA Dudani
SE Said
SG Mallat
Sudipto Guha
T Benkedjouh
T Hastie
T Hastie
T Hofmann
T Jo
T Loutas
TY Wu
V Vapnik
W Pedrycz
Y Saeys
Publication venue
Publication date: 01/01/2018
Field of study

The complexity involved in the process of real-time data-driven monitoring dynamic systems for predicted maintenance is usually huge. With more or less in-depth any data-driven approach is sensitive to data preprocessing, understood as any data treatment prior to the application of the monitoring model, being sometimes crucial for the final development of the employed monitoring technique. The aim of this work is to quantify the sensitiveness of data-driven predictive maintenance models in dynamic systems in an exhaustive way. We consider a couple of predictive maintenance scenarios, each of them defined by some public available data. For each scenario, we consider its properties and apply several techniques for each of the successive preprocessing steps, e.g. data cleaning, missing values treatment, outlier detection, feature selection, or imbalance compensation. The pretreatment configurations, i.e. sequential combinations of techniques from different preprocessing steps, are considered together with different monitoring approaches, in order to determine the relevance of data preprocessing for predictive maintenance in dynamical systems

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

BCAM's Institutional Repository Data

Predicting progression of mild cognitive impairment to dementia using neuropsychological data: a supervised learning approach using time windows

Author: A Espinosa
A Mendonça de
Alexandre de Mendonça
American Psychiatric Association
Ana Rodrigues
AV Carreiro
B Zhou
BC Dickerson
C Cabral
C Hinrichs
C Nunes
C Salvatore
CR Jack
D Silva
DE Barnes
Dina Silva
E Frank
E Moradi
F Noorbakhsh
F Portet
Isabel Santana
J Demsar
J Maroco
J Maroco
JC Morris
K Langa
L Nanni
L Tay
Luís Lemos
M Ewers
M Huang
M Kruczyk
MA Hall
Manuela Guerreiro
MH Tabert
MS Albert
MW Bondi
NM Samtani
NV Chawla
OM Doyle
P Battista
RM Chapman
S Adaszewski
S Ayton
S Belleville
S Palmqvist
Sandra Cardoso
Sara C. Madeira
SF Eskildsen
SJ Lee
Telma Pereira
Y Cui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2013
Field of study

Background: Predicting progression from a stage of Mild Cognitive Impairment to dementia is a major pursuit in current research. It is broadly accepted that cognition declines with a continuum between MCI and dementia. As such, cohorts of MCI patients are usually heterogeneous, containing patients at different stages of the neurodegenerative process. This hampers the prognostic task. Nevertheless, when learning prognostic models, most studies use the entire cohort of MCI patients regardless of their disease stages. In this paper, we propose a Time Windows approach to predict conversion to dementia, learning with patients stratified using time windows, thus fine-tuning the prognosis regarding the time to conversion. Methods: In the proposed Time Windows approach, we grouped patients based on the clinical information of whether they converted (converter MCI) or remained MCI (stable MCI) within a specific time window. We tested time windows of 2, 3, 4 and 5 years. We developed a prognostic model for each time window using clinical and neuropsychological data and compared this approach with the commonly used in the literature, where all patients are used to learn the models, named as First Last approach. This enables to move from the traditional question "Will a MCI patient convert to dementia somewhere in the future" to the question "Will a MCI patient convert to dementia in a specific time window". Results: The proposed Time Windows approach outperformed the First Last approach. The results showed that we can predict conversion to dementia as early as 5 years before the event with an AUC of 0.88 in the cross-validation set and 0.76 in an independent validation set. Conclusions: Prognostic models using time windows have higher performance when predicting progression from MCI to dementia, when compared to the prognostic approach commonly used in the literature. Furthermore, the proposed Time Windows approach is more relevant from a clinical point of view, predicting conversion within a temporal interval rather than sometime in the future and allowing clinicians to timely adjust treatments and clinical appointments.FCT under the Neuroclinomics2 project [PTDC/EEI-SII/1937/2014, SFRH/BD/95846/2013]; INESC-ID plurianual [UID/CEC/50021/2013]; LASIGE Research Unit [UID/CEC/00408/2013

Crossref

Directory of Open Access Journals

Estudo Geral

Sapientia

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato

SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

Author: Chawla Nitesh V.
Fernández Hilario Alberto Luis
García López Salvador
Herrera Triguero Francisco
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2018
Field of study

The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered \de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to di erent type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several di erent domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also signi cantly contributed to new supervised learning paradigms, including multilabel classi cation, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of di erent software packages | from open source to commercial. In this paper, marking the fteen year anniversary of SMOTE, we re ect on the SMOTE journey, discuss the current state of a airs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project 887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016; and the National Science Foundation (NSF) Grant IIS-1447795

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada