Search CORE

166 research outputs found

Learning Active Learning from Data

Author: Fua Pascal
Konyushkova Ksenia
Sznitman Raphael
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from previous AL outcomes. We show that a strategy can be learnt either from simple synthetic 2D datasets or from a subset of domain-specific data. Our method yields strategies that work well on real data from a wide range of domains

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Bern Open Repository and Information System (BORIS)

Handling class imbalance in credit card fraud using resampling methods

Author: Hordri Nur Farhana
Mohd. Azmi Nurulhuda Firdaus
Shamsuddin Siti Mariyam
Yuhaniz Siti Sophiayati
Publication venue: 'The Science and Information Organization'
Publication date: 01/01/2018
Field of study

Credit card based online payments has grown intensely, compelling the financial organisations to implement and continuously improve their fraud detection system. However, credit card fraud dataset is heavily imbalanced and different types of misclassification errors may have different costs and it is essential to control them, to a certain degree, to compromise those errors. Classification techniques are the promising solutions to detect the fraud and non-fraud transactions. Unfortunately, in a certain condition, classification techniques do not perform well when it comes to huge numbers of differences in minority and majority cases. Hence in this study, resampling methods, Random Under Sampling, Random Over Sampling and Synthetic Minority Oversampling Technique, were applied in the credit card dataset to overcome the rare events in the dataset. Then, the three resampled datasets were classified using classification techniques. The performances were measured by their sensitivity, specificity, accuracy, precision, area under curve (AUC) and error rate. The findings disclosed that by resampling the dataset, the models were more practicable, gave better performance and were statistically better

Universiti Teknologi Malaysia Institutional Repository

Online automated machine learning for class imbalanced data streams

Author: Wang Shuo
Wang Zhaoyang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/04/2023
Field of study

University of Birmingham Research Portal