Search CORE

39 research outputs found

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

Author: A Fernandez
B Krawczyk
C Chen
D Wilson
E Tang
G Pio
GM Weiss
H He
J Błaszczyński
J Jelonek
J Seaz
Jerzy Stefanowski
K Napierala
L Breiman
M Galar
Mateusz Lango
N Chawla
P Branco
R Blagus
S Hido
S Rio
S Wang
T Ho
T Jo
V Lopez
W Lin
Y Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Assessing impacts of data volume and data set balance in using deep learning approach to human activity recognition

Author: Chen Haipeng
Hong Xuemin
Lu Hai
Peng Ao
Shi Haibin
Tang Biyu
Wu Dihong
Xiong Fuhai
Zheng Huiru
Zheng Lingxiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2017
Field of study

Crossref

Ulster University's Research Portal

BAGGING BASED ENSEMBLE ANALYSIS IN HANDLING UNBALANCED DATA ON CLASSIFICATION MODELING

Author: Ananto Niel
Koapaha Hartini Pop
Publication venue: 'Universitas Klabat'
Publication date: 30/09/2021
Field of study

The purpose of this study is to Identify the algorithm of each method of handling the unbalanced class based on bagging based on the literature review. This study uses a bagging based ensemble method such as UnderBagging, OverBagging, UnderOverBagging, SMOTEBagging, Roughly Balanced Bagging and the last one is the Bagging Ensemble Variation. The data used is coded from the UCI Repository with 16 data, eight of which have class categories with low imbalance problems, and the rest are categorized as high imbalance problems. The number of classes used in this study amounted to two classes. The class with a small number is made into the minority class and the rest is made up as the majority class. The result of this research is the bagging based method gives better results when compared to classical methods such as the classification tree

UNKLAB Ejournal System (Univ. Klabat)

Evaluation Measures for Models Assessment over Imbalanced Data Sets

Author: Alitouche Taklit Akrouf
Bekkar Mohamed
Djemaa Hassiba Kheliouane
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 01/10/2013
Field of study

Imbalanced data learning is one of the challenging problems in data mining; among this matter, founding the right model assessment measures is almost a primary research issue. Skewed class distribution causes a misreading of common evaluation measures as well it lead a biased classification. This article presents a set of alternative for imbalanced data learning assessment, using a combined measures (G-means, likelihood ratios, Discriminant power, F-Measure Balanced Accuracy, Youden index, Matthews correlation coefficient), and graphical performance assessment (ROC curve, Area Under Curve, Partial AUC, Weighted AUC, Cumulative Gains Curve and lift chart, Area Under Lift AUL), that aim to provide a more credible evaluation. We analyze the applications of these measures in churn prediction models evaluation, a well known application of imbalanced data Keywords: imbalanced data, Model assessment, accuracy , G-means, likelihood ratios, F-Measure, Youden index, Matthews correlation coefficient, ROC, AUC, P-AUC,W-AUC, Lift, AU

International Institute for Science, Technology and Education (IISTE): E-Journals

Model Balanced Bagging Berbasis Decision Tree Pada Dataset Imbalanced Class

Author: Aditya Ahmad Zein
Yoga Pristyanto
Publication venue: 'Sekolah Tinggi Manajemen Informatika & Komputer Atma Luhur'
Publication date: 01/03/2023
Field of study

Algoritma klasifikasi merupakan algoritma yang sangat sering digunakan beriringan dengan kebutuhan manusia, namun peneliti an sebelumnya sering dijumpai kendala saat menggunakan algoritma klasifikasi. Salah satu permasalahan yang sering sekali dijumpai ialah kasus imbalanced dataset. Sehingga dalam penelitian ini diusulkan ensemble method untuk mengatasinya, salah satu algoritma ensemble method yang terkenal ialah bagging. Implementasi balanced-bagging digunakan untuk meningkatkan kemampuan dari algoritma bagging. Dalam penelitian ini melibatkan perbandingan tiga model klasifikasi berbeda dengan lima dataset yang memiliki imbalanced ratio (IR) yang berbeda, Model akan dievaluasi berdasarkan metrik akurasi (balanced accuracy), geometric mean dan area under curve (AUC). Model pertama merupakan proses klasifikasi menggunakan Decision Tree (tanpa Bagging), Model kedua merupakan proses klasifikasi menggunakan Decision Tree (dengan Bagging) dan model ketiga menggunakan Decision Tree (dengan Balanced-Bagging). Implementasi metode bagging dan balanced bagging terhadap algoritma klasifikasi Decision Tree mampu meningkatkan kinerja hasil akurasi (balanced accuracy), geometric mean, dan AUC. Secara umum model Decision Tree + Balanced Bagging menghasilkan kinerja yang terbaik pada seluruh dataset yang digunakan

Directory of Open Access Journals

PROPAGATION OF MISCLASSIFIED INSTANCES TO HANDLE NONSTATIONARY IMBALANCED DATA STREAM

Author: MEENAKSHI A. THALOR
S. T. PATIL
Publication venue: Taylor's University
Publication date: 01/04/2018
Field of study

Learning on the data stream with nonstationary and imbalanced property is an interesting and complicated problem in data mining as change in class distribution may result in class unbalancing. Many real time problems like intrusion detection, credit card fraud detection, weather forecasting and many more applications suffer concept drift as well as class imbalance as they change with time. The rationale of this paper is to present an effective learning for nonstationary imbalanced data stream which emphasis on misclassified examples with the focus on two-class problems. At the end of paper, proposed algorithms is compared with existing similar approaches using various evaluation metrics

Directory of Open Access Journals

A big data MapReduce framework for fault diagnosis in cloud-based manufacturing

Author: Ajay Kumar (192967)
Alok Choudhary (1251471)
Lakshman S. Thakur (7199684)
Ravi Shankar (103040)
Publication venue
Publication date: 04/03/2016
Field of study

This research develops a MapReduce framework for automatic pattern recognition based on fault diagnosis by solving data imbalance problem in a cloud-based manufacturing (CBM). Fault diagnosis in a CBM system significantly contributes to reduce the product testing cost and enhances manufacturing quality. One of the major challenges facing the big data analytics in cloud-based manufacturing is handling of datasets, which are highly imbalanced in nature due to poor classification result when machine learning techniques are applied on such datasets. The framework proposed in this research uses a hybrid approach to deal with big dataset for smarter decisions. Furthermore, we compare the performance of radial basis function based Support Vector Machine classifier with standard techniques. Our findings suggest that the most important task in cloud-based manufacturing, is to predict the effect of data errors on quality due to highly imbalance unstructured dataset. The proposed framework is an original contribution to the body of literature, where our proposed MapReduce framework has been used for fault detection by managing data imbalance problem appropriately and relating it to firm’s profit function. The experimental results are validated using a case study of steel plate manufacturing fault diagnosis, with crucial performance matrices such as accuracy, specificity and sensitivity. A comparative study shows that the methods used in the proposed framework outperform the traditional ones

Loughborough University Institutional Repository