3 research outputs found

    Learning With Imbalanced Data in Smart Manufacturing: A Comparative Analysis

    Get PDF
    The Internet of Things (IoT) paradigm is revolutionising the world of manufacturing into what is known as Smart Manufacturing or Industry 4.0. The main pillar in smart manufacturing looks at harnessing IoT data and leveraging machine learning (ML) to automate the prediction of faults, thus cutting maintenance time and cost and improving the product quality. However, faults in real industries are overwhelmingly outweighed by instances of good performance (faultless samples); this bias is reflected in the data captured by IoT devices. Imbalanced data limits the success of ML in predicting faults, thus presents a significant hindrance in the progress of smart manufacturing. Although various techniques have been proposed to tackle this challenge in general, this work is the first to present a framework for evaluating the effectiveness of these remedies in the context of manufacturing. We present a comprehensive comparative analysis in which we apply our proposed framework to benchmark the performance of different combinations of algorithm components using a real-world manufacturing dataset. We draw key insights into the effectiveness of each component and inter-relatedness between the dataset, the application context, and the design of the ML algorithm

    Explainable credit scoring through generative adversarial networks

    Get PDF
    Credit scoring has been playing a vital role in mitigating financial risk that could affect the sustainability of financial institutions. An accurate and automated credit scoring allows to control the financial risk by using the state-of-the-art and data-driven analytics. The primary rationale of this thesis is to understand and improve financial credit scoring models. The key issues that occur in the process of developing credit scoring model using the state-of-the-art machine learning(ML) techniques, are identified and investigated. Through the proposed models using ML approaches in this thesis, the challenges in credit scoring can be resolved. Therefore, the existing credit scoring models can be improved by novel computer science techniques in realistic problem of the areas as follows. First, an interpretability aspect of credit scoring as eXplainable Artificial Intelligence (XAI) is examined by non-parametric tree-based ML models combining with SHapley Additive exPlanations (SHAP). In this experiment, the suitability of tree-based ensemble models is also assessed in imbalanced credit scoring dataset, comparing the performance of different class imbalance. In order to achieve explainability as well as high predictive performance in credit scoring, we propose a model named as NATE which is Non-pArameTric approach for Explainable credit scoring. This explainable and comprehensible NATE allows us to analyse the key factors of credit scoring by SHAP values both locally and globally in addition to robust predictive power for creditworthiness. Second, the issue of class imbalance is investigated. Class imbalance in datasets occurs when there are a huge number of differences of observations between the classes in the dataset. The imbalanced class in real-world credit scoring datasets results in the biased classification performance for credit worthiness. As an approach to overcome the limitation of traditional resampling methods for class imbalance, we propose a model named as NOTE which is Non-parametric Oversampling Techniques for Explainable credit scoring. By using conditional Wasserstein Generative Adversarial Networks (cWGAN)-based oversampling technique paired with Non-parametric Stacked Autoen-coder (NSA), NOTE as a generative model allows to oversample minority class with reflecting the complex and non-linear patterns in the dataset. Therefore, NOTE predicts the classification and explains the credit scoring model with unbiased performance on a balanced credit scoring dataset. Third, incomplete data is also a common issue in credit scoring datasets. This missingness normally distorts the analysis and prediction for credit scoring, and results in the misclassification for creditworthiness. To address the issue of missing values in the dataset and overcome the limitation of conventional imputation methods, we propose a model named as DITE which is Denoising Imputation TEchniques for missingness in credit scoring. By using the extended Generative Adversarial Imputation Networks (GAIN) paired with randomised Singular Value Decomposition (rSVD), DITE is capable of replacing missing values with plausible estimation through reducing the noise and capturing complex missing patterns in dataset. To evaluate the robustness and effectiveness of the proposed models for key issues, namely, model explainability, class imbalance, and missing-ness in the dataset, the performances of models using ML are compared against the benchmarks of literature on publicly available real-world financial credit scoring datasets, respectively. Our experimental results successfully demonstrated the robustness and effectiveness of the novel concepts used in the models by outperforming the benchmarks. Furthermore, the pro-posed NATE, NOTE and DITE also lead to a better model explainability, suitability, stability, and superiority on complex and non-linear credit scoring datasets. Finally, this thesis demonstrated that the existing credit scoring models can be improved by novel computer science techniques in real-world problem of credit scoring domain
    corecore