4 research outputs found
Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications
Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties create bias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics
Cost-sensitive online classification
Ministry of Education, Singapore under its Academic Research Funding Tier
Cost-sensitive online classification
Ministry of Education, Singapore under its Academic Research Funding Tier
Online rare events detection
Rare events detection is regarded as an imbalanced classification problem, which attempts to detect the events with high impact but low probability. Rare events detection has many applications such as network intrusion detection and credit fraud detection. In this paper we propose a novel online algorithm for rare events detection. Different from traditional accuracy-oriented approaches, our approach employs a number of hypothesis tests to perform the cost/benefit analysis. Our approach can handle online data with unbounded data volume by setting up a proper moving-window size and a forgetting factor. A comprehensive theoretical proof of our algorithm is given. We also conduct the experiments that achieve significant improvements compared with the most relevant algorithms based on publicly available real-world datasets