Search CORE

23,099 research outputs found

A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets

Author: Dapeng Wang
Yong Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

In imbalanced learning methods, resampling methods modify an imbalanced dataset to form a balanced dataset. Balanced data sets perform better than imbalanced datasets for many base classifiers. This paper proposes a cost-sensitive ensemble method based on cost-sensitive support vector machine (SVM), and query-by-committee (QBC) to solve imbalanced data classification. The proposed method first divides the majority-class dataset into several subdatasets according to the proportion of imbalanced samples and trains subclassifiers using AdaBoost method. Then, the proposed method generates candidate training samples by QBC active learning method and uses cost-sensitive SVM to learn the training samples. By using 5 class-imbalanced datasets, experimental results show that the proposed method has higher area under ROC curve (AUC), F-measure, and G-mean than many existing class-imbalanced learning methods

Crossref

Directory of Open Access Journals

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Author: Ahmed Sajid
Farid Dewan Md.
Jani Md. Rafsan
Mahbub Asif
Rayhan Farshid
Shatabda Swakkhar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/12/2017
Field of study

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201

arXiv.org e-Print Archive

Crossref

Financial predictions using cost sensitive neural networks for multi-class learning

Author: Rozaki Eleni
Publication venue: 'Trans Tech Publications, Ltd.'
Publication date: 01/04/2016
Field of study

The interest in the localisation of wireless sensor networks has grown in recent years. A variety of machine-learning methods have been proposed in recent years to improve the optimisation of the complex behaviour of wireless networks. Network administrators have found that traditional classification algorithms may be limited with imbalanced datasets. In fact, the problem of imbalanced data learning has received particular interest. The purpose of this study was to examine design modifications to neural networks in order to address the problem of cost optimisation decisions and financial predictions. The goal was to compare four learning-based techniques using cost-sensitive neural network ensemble for multiclass imbalance data learning. The problem is formulated as a combinatorial cost optimisation in terms of minimising the cost using meta-learning classification rules for Naïve Bayes, J48, Multilayer Perceptions, and Radial Basis Function models. With these models, optimisation faults and cost evaluations for network training are considered

Online Research @ Cardiff

WTEN: An advanced coupled tensor factorization strategy for learning from imbalanced data

Author: AK Menon
FM Harper
G Wu
H He
JP Bradford
NV Chawla
NV Chawla
R Akbani
T Fawcett
T Jo
TG Kolda
XY Liu
Y Koren
ZH Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

© Springer International Publishing AG 2016. Learning from imbalanced and sparse data in multi-mode and high-dimensional tensor formats efficiently is a significant problem in data mining research. On one hand,Coupled Tensor Factorization (CTF) has become one of the most popular methods for joint analysis of heterogeneous sparse data generated from different sources. On the other hand,techniques such as sampling,cost-sensitive learning,etc. have been applied to many supervised learning models to handle imbalanced data. This research focuses on studying the effectiveness of combining advantages of both CTF and imbalanced data learning techniques for missing entry prediction,especially for entries with rare class labels. Importantly,we have also investigated the implication of joint analysis of the main tensor and extra information. One of our major goals is to design a robust weighting strategy for CTF to be able to not only effectively recover missing entries but also perform well when the entries are associated with imbalanced labels. Experiments on both real and synthetic datasets show that our approach outperforms existing CTF algorithms on imbalanced data

Crossref

OPUS - University of Technology Sydney