Credit Risk Prediction using Ensemble and Linear Machine Learning Models

Abstract

Article, Faculty of Natural and Agricultural Sciences (Unit for Data Science and Computing (UDSC)--Potchefstroom CampusPredicting the likelihood of loan default remains a critical challenge in credit risk modeling, where data imbalance, high dimensionality, and nonlinear interactions often limit the effectiveness of traditional scoring techniques. This paper presents a machine learning pipeline for credit risk prediction using financial datasets. We evaluate six main classifiers—Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, Random Forest, XGBoost, and LightGBM and a variant of two of the classifiers for further comparison. Models are benchmarked using accuracy, precision, recall, and the Kolmogorov–Smirnov statistic widely used in financial risk scoring. Our results indicate that ensemble methods combined with hybrid resampling techniques can consistently offer significant improvements in default risk separation without requiring dimensionality reduction methods, complex deep neural architectures or other black-box models. This makes them suitable for both regulated credit scoring environments and modern machine learning-driven financial applications

Similar works

Full text

thumbnail-image

NWU Institutional Repository (North-West University)

redirect
Last time updated on 21/01/2026

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.