Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm

Lechleitner, Maria

Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm

Authors: Maria Lechleitner
Publication date: 27 May 2020
Publisher

Abstract

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the age of Big Data, many machine learning tasks in numerous industries are still restricted due to the use of small datasets. The limited availability of data often results in unsatisfactory prediction performance of supervised learning algorithms and, consequently, poor decision making. The current research work aims to mitigate the small dataset problem by artificial data generation in the pre-processing phase of the data analysis process. The oversampling technique Geometric SMOTE is applied to generate new training instances and enhance crisp data structures. Experimental results show a significant improvement on the prediction accuracy when compared with the use of original, small datasets and over other oversampling techniques such as Random Oversampling, SMOTE and Borderline SMOTE. These findings show that artificial data creation is a promising approach to overcome the problem of small data in classification tasks

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Repositório da Universidade Nova de Lisboa

oai:run.unl.pt:10362/99077

Last time updated on 23/11/2020

New University of Lisbon's Repository

oai:run.unl.pt:10362/99077

Last time updated on 17/07/2020