Search CORE

5 research outputs found

Coupling different methods for overcoming the class imbalance problem

Author: Fantozzi Carlo
N. Lazzarini
Nanni Loris
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Many classification problems must deal with imbalanced datasets where one class \u2013 the majority class \u2013 outnumbers the other classes. Standard classification methods do not provide accurate predictions in this setting since classification is generally biased towards the majority class. The minority classes are oftentimes the ones of interest (e.g., when they are associated with pathological conditions in patients), so methods for handling imbalanced datasets are critical. Using several different datasets, this paper evaluates the performance of state-of-the-art classification methods for handling the imbalance problem in both binary and multi-class datasets. Different strategies are considered, including the one-class and dimension reduction approaches, as well as their fusions. Moreover, some ensembles of classifiers are tested, in addition to stand-alone classifiers, to assess the effectiveness of ensembles in the presence of imbalance. Finally, a novel ensemble of ensembles is designed specifically to tackle the problem of class imbalance: the proposed ensemble does not need to be tuned separately for each dataset and outperforms all the other tested approaches. To validate our classifiers we resort to the KEEL-dataset repository, whose data partitions (training/test) are publicly available and have already been used in the open literature: as a consequence, it is possible to report a fair comparison among different approaches in the literature. Our best approach (MATLAB code and datasets not easily accessible elsewhere) will be available at https://www.dei.unipd.it/node/2357

Newcastle University E-Prints

Archivio istituzionale della ricerca - Università di Padova

A reduced labeled samples (RLS) framework for classification of imbalanced concept-drifting streaming data.

Author: Arabmakki Elaheh
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2016
Field of study

Stream processing frameworks are designed to process the streaming data that arrives in time. An example of such data is stream of emails that a user receives every day. Most of the real world data streams are also imbalanced as is in the stream of emails, which contains few spam emails compared to a lot of legitimate emails. The classification of the imbalanced data stream is challenging due to the several reasons: First of all, data streams are huge and they can not be stored in the memory for one time processing. Second, if the data is imbalanced, the accuracy of the majority class mostly dominates the results. Third, data streams are changing over time, and that causes degradation in the model performance. Hence the model should get updated when such changes are detected. Finally, the true labels of the all samples are not available immediately after classification, and only a fraction of the data is possible to get labeled in real world applications. That is because the labeling is expensive and time consuming. In this thesis, a framework for modeling the streaming data when the classes of the data samples are imbalanced is proposed. This framework is called Reduced Labeled Samples (RLS). RLS is a chunk based learning framework that builds a model using partially labeled data stream, when the characteristics of the data change. In RLS, a fraction of the samples are labeled and are used in modeling, and the performance is not significantly different from that of the 100% labeling. RLS maintains an ensemble of classifiers to boost the performance. RLS uses the information from labeled data in a supervised fashion, and also is extended to use the information from unlabeled data in a semi supervised fashion. RLS addresses both binary and multi class partially labeled data stream and the results show the basis of RLS is effective even in the context of multi class classification problems. Overall, the RLS is shown to be an effective framework for processing imbalanced and partially labeled data streams

University of Louisville

Kernel methods for the incorporation of prior-knowledge into support vector machines

Author: VEILLARD ANTOINE PAUL MITSUMASA
Publication venue
Publication date: 16/08/2012
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Dynamic class imbalance learning for incremental LPSVM

Author: Zhu Lei
Publication venue
Publication date: 01/01/2012
Field of study

Abstract Linear Proximal Support Vector Machines (LPSVM), like decision trees, classic SVM, etc. are originally not equipped to handle drifting data streams that exhibit high and varying degrees of class imbalance. For online classification of data streams with imbalanced class distribution, we propose an incremental LPSVM ter-med DCIL-IncLPSVM that has robust learning performance under class imbalance. In doing so, we simplify a weighted LPSVM, which is computationally not renewable, as several core matrices multiplying two simple weight coefficients. When data addition and/or retirement occurs, the proposed DCIL-IncLPSVM accommodates current class imbalance by a simple matrix and coefficient updating, meanwhile ensures no discriminative information lost throughout the learning process. Experiments on benchmark datasets indicate that the proposed DCIL-IncLPSVM outperforms batch SVM and LPSVM in terms of F-measure, relative sensitivity and G-mean metrics. Moreover, our application to online face membership authentication shows that the proposed DCIL-IncLPSVM remains effective in the presence of highly dynamic class imbalance, which usually poses serious problems to classic incremental SVM (IncSVM) and incremental LPSVM (IncLPSVM)

Unitec Research Bank

Dynamic class imbalance learning for incremental LPSVM

Author: Ban Tao
Chen Gang
Inoue Daisuke
Pang Shaoning
Sarrafzadeh Hossein
Zhu Lei
Publication venue: National Institute of Information and Communications Technology (Tokyo, Japan)
Publication date: 01/02/2013
Field of study

Linear Proximal Support Vector Machines (LPSVMs), like decision trees, classic SVM, etc. are originally not equipped to handle drifting data streams that exhibit high and varying degrees of class imbalance. For online classification of data streams with imbalanced class distribution, we propose a dynamic class imbalance learning (DCIL) approach to incremental LPSVM (IncLPSVM) modeling. In doing so, we simplify a computationally non-renewable weighted LPSVM to several core matrices multiplying two simple weight coefficients. When data addition and/or retirement occurs, the proposed DCIL-IncLPSVM1 accommodates newly presented class imbalance by a simple matrix and coefficient updating, meanwhile ensures no discriminative information lost throughout the learning process. Experiments on benchmark datasets indicate that the proposed DCIL-IncLPSVM outperforms classic IncSVM and IncLPSVM in terms of F-measure and G-mean metrics. Moreover, our application to online face membership authentication shows that the proposed DCIL-IncLPSVM remains effective in the presence of highly dynamic class imbalance, which usually poses serious problems to previous approaches

Unitec Research Bank