Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the age of Big Data, many machine learning tasks in numerous industries are still restricted due to the use of
small datasets. The limited availability of data often results in unsatisfactory prediction performance of
supervised learning algorithms and, consequently, poor decision making. The current research work aims to
mitigate the small dataset problem by artificial data generation in the pre-processing phase of the data analysis
process. The oversampling technique Geometric SMOTE is applied to generate new training instances and
enhance crisp data structures. Experimental results show a significant improvement on the prediction accuracy
when compared with the use of original, small datasets and over other oversampling techniques such as Random
Oversampling, SMOTE and Borderline SMOTE. These findings show that artificial data creation is a promising
approach to overcome the problem of small data in classification tasks