Search CORE

22 research outputs found

Overlap-based undersampling method for classification of imbalanced medical datasets.

Author: A Kalantari
B Krawczyk
B Krawczyk
C Bunkhumpornpat
C Bunkhumpornpat
DL Wilson
G Haixiang
H Han
J Jiang
L Zhang
M Bach
M Havaei
NV Chawla
P Vuttipittayamongkol
P Vuttipittayamongkol
S Fotouhi
S Shilaskar
SH Bae
UR Acharya
W Han
WC Lin
X Wan
X Yuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/05/2020
Field of study

Early diagnosis of some life-threatening diseases such as cancers and heart is crucial for effective treatments. Supervised machine learning has proved to be a very useful tool to serve this purpose. Historical data of patients including clinical and demographic information is used for training learning algorithms. This builds predictive models that provide initial diagnoses. However, in the medical domain, it is common to have the positive class under-represented in a dataset. In such a scenario, a typical learning algorithm tends to be biased towards the negative class, which is the majority class, and misclassify positive cases. This is known as the class imbalance problem. In this paper, a framework for predictive diagnostics of diseases with imbalanced records is presented. To reduce the classification bias, we propose the usage of an overlap-based undersampling method to improve the visibility of minority class samples in the region where the two classes overlap. This is achieved by detecting and removing negative class instances from the overlapping region. This will improve class separability in the data space. Experimental results show achievement of high accuracy in the positive class, which is highly preferable in the medical domain, while good trade-offs between sensitivity and specificity were obtained. Results also show that the method often outperformed other state-of-the-art and well-established techniques

Crossref

Open Access Institutional Repository at Robert Gordon University

Improving Risk Predictions by Preprocessing Imbalanced Credit Data

Author: B. Tian
C. Bunkhumpornpat
C. Phua
D.L. Wilson
G.E.A.P.A. Batista
I. Brown
J. Demšar
J. Laurikkala
K. Kennedy
L.C. Thomas
N. Japkowicz
N.M. Kiefer
N.V. Chawla
P.E. Hart
S.J. Yen
V. Vinciotti
Y.M. Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

An insight into imbalanced Big Data classification: outcomes and challenges

Author: A Fernández
A Fernández
A Thusoo
B Krawczyk
C Bunkhumpornpat
CP Chen
D Lyubimov
E Elsebakhi
E Ramentol
F Hu
F Hu
G Haixiang
GEAPA Batista
GM Weiss
H He
H Yu
I Triguero
I Triguero
J Alcalá-Fdez
J Dean
J Huang
J Li
JA Sáez
JM Tomczak
K Kambatla
L Rokach
M Galar
M Galar
M Wasikowski
NV Chawla
NV Chawla
PC Zikopoulos
R Baeza-Yates
R Barandela
R Blagus
RC Prati
S Alshomrani
S Barua
S Elhag
S Kamal
S Owen
S Río
S Río
S-H Park
T Jo
T White
V García
V López
V López
V López
X Meng
X Wu
Y Guo
Y Sun
Y-S Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. However, traditional learning approaches cannot be directly applied due to scalability issues. To overcome this issue, the MapReduce framework has arisen as a “de facto” solution. Basically, it carries out a “divide-and-conquer” distributed procedure in a fault-tolerant way to adapt for commodity hardware. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts, are accentuated during the data partitioning to fit the MapReduce programming style. This paper is designed under three main pillars. First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current research state of this area. Second, to analyze the behavior of standard pre-processing techniques in this particular framework. Finally, taking into account the experimental results obtained throughout this work, we will carry out a discussion on the challenges and future directions for the topic.This work has been partially supported by the Spanish Ministry of Science and Technology under Projects TIN2014-57251-P and TIN2015-68454-R, the Andalusian Research Plan P11-TIC-7765, the Foundation BBVA Project 75/2016 BigDaPTOOLS, and the National Science Foundation (NSF) Grant IIS-1447795

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Repositorio Institucional Universidad de Granada

SMOTE for high-dimensional class-imbalanced data

Author: A Fallahi
A Hinneburg
B Wallace
C Bunkhumpornpat
C Cortes
C Drummond
C Sotiriou
CM Bishop
DA Cieslak
E Fix
H Han
H He
J Pittman
J Wang
J Xiao
J Zhu
JV Hulse
K Beyer
KD MacIsaac
L Breiman
L Breiman
Lara Lusa
LD Miller
MA Shipp
N Iizuka
NV Chawla
P Radivojac
Q Gu
R Batuwita
R Blagus
R Development Core Team
R Johnson
R Tibshirani
RM Simon
Rok Blagus
S Daskalaki
S Doyle
S Dudoit
S Ramaswamy
SE Ertekin
T Fawcett
TP Speed
Y Guo
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Newton and Epicurus

Author: C. Bunkhumpornpat
H. Hassanzadeh
M. Galar
M. Liakata
N.V. Chawla
Publication venue: Editoriale Programma:P Tta I Nievo 3 Bis, 35121 Padua Italy:011 39 049 8753110
Publication date: 01/01/1974
Field of study

Archivio Ricerca Ca'Foscari

Crossref

A Pruning-Based Approach for Searching Precise and Generalized Region for Synthetic Minority Over-Sampling

Author: A. Bradley
C. Bunkhumpornpat
C.J. Rijsbergen van
G.M. Weiss
H. Han
H. He
J.R. Quinlan
M. Hall
N.V. Chawla
P.H. Ramsey
Q. Yang
X. Fan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Collaborative and Reproducible Research: Goals, Challenges, and Strategies

Author: Apurva Bedagkar-Gala
Bennet A. Landman
C Bunkhumpornpat
George Shih
Marc Kohli
Paul Nagy
SG Langer
SG Langer
Steve G. Langer
Zhan Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique

Author: C Bunkhumpornpat
C Seiffert
D Tao
E Ramentol
G Hinton
GE Hinton
H Han
H He
I Tomek
NV Chawla
NV Chawla
S Chen
S García
S Hido
Y Tang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Robust Classifier for Imbalanced Datasets

Author: C. Bunkhumpornpat
D.A. Cieslak
D.A. Cieslak
F. Provost
H. Han
J. Hulse Van
L. Breiman
N.V. Chawla
P. Geurts
R.E. Schapire
S. Hido
T. Jo
T.K. Ho
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

A Preprocessing Approach for Class-Imbalanced Data Using SMOTE and Belief Function Theory

Author: AP Bradley
AP Dempster
B Krawczyk
C Bunkhumpornpat
E Ramentol
G Haixiang
G Shafer
H Han
H He
J Demšar
JA Sáez
K Napierała
N Chawla
P Smets
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/11/2020
Field of study

International audienceDealing with imbalanced datasets at the preprocessing level is an efficient strategy used by many methods to re-balance the data and improve classification performance. Specifically, SMOTE is a popular oversampling technique which modifies the training data by adding artificial minority samples. However, SMOTE may create instances in noisy and overlapping areas, far from safe regions. To tackle this issue, we propose SMOTE-BFT, in which we use the belief function theory to remove generated minority instances that are not in safe regions. After applying SMOTE, each generated minority instance is represented by an evidential membership structure, which provides detailed information about class memberships. Rules based on the belief function theory are then enforced to detect and remove generated instances that are in noisy and overlapping regions. Experiments on noisy artificial datasets show that our proposal significantly outperforms other popular oversampling methods

Crossref

HAL-Artois

Hal-Diderot