Search CORE

10 research outputs found

Classification on imbalanced data sets, taking advantage of errors to improve performance

Author: C Lemnaru
CS Hilas
H He
J Sun
N Esfandiari
N Tomasev
V García
VS Sheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Classification methods usually exhibit a poor performance when they are applied on imbalanced data sets. In order to overcome this problem, some algorithms have been proposed in the last decade. Most of them generate synthetic instances in order to balance data sets, regardless the classification algorithm. These methods work reasonably well in most cases; however, they tend to cause over-fitting. In this paper, we propose a method to face the imbalance problem. Our approach, which is very simple to implement, works in two phases; the first one detects instances that are difficult to predict correctly for classification methods. These instances are then categorized into “noisy” and “secure”, where the former refers to those instances whose most of their nearest neighbors belong to the opposite class. The second phase of our method, consists in generating a number of synthetic instances for each one of those that are difficult to predict correctly. After applying our method to data sets, the AUC area of classifiers is improved dramatically. We compare our method with others of the state-of-the-art, using more than 10 data sets

Crossref

Red Mexicana de Repositorios Institucionales

Repositorio Institucional de la Universidad Autónoma del Estado de México

Fraud Risk Modelling: Requirements Elicitation in the Case of Telecom Services

Author: CS Hilas
D Ionita
G Macia-Fernandez
H Farvaresh
M Yelland
P Burge
PA Estévez
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Online Research Database In Technology

Profiling high leverage points for detecting anomalous users in telecom data networks

Author: A Aghasaryan
A Barrat
AP Reynolds
CS Hilas
JA Hartigan
João Gama
MA Azad
MA Azad
Muhammad Ajmal Azad
P Arora
Shazia Tabassum
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Concept drift for big data

Author: CS Hilas
D Cohn
DA Ross
G Cauwenberghs
HM Gomes
JL Lobo
NC Oza
PD Yoo
S Aminikhanghahi
S Hochreiter
TS Sethi
V Losing
W Gerstner
W Lee
W Zang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

The term “concept drift” refers to a change in statistical distribution of the data. In machine learning and predictive analysis, a fundamental assumption exits which reasons that the data is a random variable which is being generated independently from an underlying stationary distribution. In this chapter we present discussions on concept drifts that are inherent in the context big data. We discuss different forms of concept drifts that are evident in streaming data and outline different techniques for handling them. Handling concept drift is important for big data where the data flow occurs continuously causing existing learned models to lose their predictive accuracy. This chapter will serve as a reference to academicians and industry practitioners who are interested in the niche area of handling concept drift for big data applications

Crossref

Research Online @ ECU