46 research outputs found

    IMPROVED SUPPORT VECTOR MACHINE PERFORMANCE USING PARTICLE SWARM OPTIMIZATION IN CREDIT RISK CLASSIFICATION

    Get PDF
    In Classification using Support Vector Machine (SVM), each kernel has parameters that affect the classification accuracy results. This study examines the improvement of SVM performance by selecting parameters using Particle Swarm Optimization (PSO) on credit risk classification, the results of which are compared with SVM with random parameter selection. The classification performance is evaluated by applying the SVM classification to the Credit German benchmark credit data set and the private credit data set which is a credit data set issued from a local bank in North Sumatra. Although it requires a longer execution time to achieve optimal accuracy values, the SVM+PSO combination is quite effective and more systematic than trial and error techniques in finding SVM parameter values, so as to produce better accuracy. In general, the test results show that the RBF kernel is able to produce higher accuracy and f1-scores than linear and polynomial kernels. SVM classification with optimization using PSO can produce better accuracy than classification using SVM without optimization, namely the determination of parameters randomly. Credit data classification accuracy increased to 92.31%

    Default Prediction of Internet Finance Users Based on Imbalance-XGBoost

    Get PDF
    Fast and accurate identification of financial fraud is a challenge in Internet finance. Based on the characteristics of imbalanced distribution of Internet financial data, this paper integrates machine learning methods and Internet financial data to propose a prediction model for loan defaults, and proves its effectiveness and generalizability through empirical research. In this paper, we introduce a processing method (link processing method) for imbalance data based on the traditional early warning model. In this paper, we conduct experiments using the financial dataset of Lending Club platform and prove that our model is superior to XGBoost, NGBoost, Ada Boost, and GBDT in the prediction of default risk

    AUTOENCODER BASED GENERATOR FOR CREDIT INFORMATION RECOVERY OF RURAL BANKS

    Get PDF
    By using machine learning algorithms, banks and other lending institutions can construct intelligent risk control models for loan businesses, which helps to overcome the disadvantages of traditional evaluation methods, such as low efficiency and excessive reliance on the subjective judgment of auditors. However, in the practical evaluation process, it is inevitable to encounter data with missing credit characteristics. Therefore, filling in the missing characteristics is crucial for the training process of those machine learning algorithms, especially when applied to rural banks with little credit data. In this work, we proposed an autoencoder-based algorithm that can use the correlation between data to restore the missing data items in the features. Also, we selected several open-source datasets (German Credit Data, Give Me Some Credit on the Kaggle platform, etc.) as the training and test dataset to verify the algorithm. The comparison results show that our model outperforms the others, although the performance of the autoencoder-based feature restorer decreases significantly when the feature missing ratio exceeds 70%

    The Application of Repeated SMOTE for Multi Class Classification on Imbalanced Data

    Get PDF
    One of the problems that are often faced by classifier algorithms is related to the problem of imbalanced data. One of the recommended improvement methods at the data level is to balance the number of data in different classes by enlarging the sample to the minority class (oversampling), one of which is called The Synthetic Minority Oversampling Technique (SMOTE). SMOTE is commonly used to balance data consisting of two classes. In this research, SMOTE was used to balance multi-class data. The purpose of this research is to balance multi-class data by applying SMOTE repeatedly. This iterative process needs to be applied if the number of unbalanced data classes is more than two classes, because the one-time SMOTE process is only suitable for binary classification or the number of unbalanced data classes is only one class. To see the performance of iterative SMOTE, the SMOTE datasets were classified using a neural network, k-NN, Nave Bayes, and Random Forest and the performance measures were measured in terms of accuracy, sensitivity, and specificity. The experiment in this research used the Glass Identification dataset which had six classes, and the SMOTE process was repeated five times. The best performance was achieved by the Random Forest classifier method with accuracy = 86.27%, sensitivity = 86.18%, and specificity = 95.82%. The result of experiment present that repeated SMOTE results can increase the performance of classification

    Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit

    Get PDF
    Pemberian kredit selalu memiliki risiko seperti kredit macet, sehingga pihak kreditur (bank) dituntut untuk lebih objektif dan akurat dalam mengevaluasi setiap permohonan kredit. Penelitian ini dilakukan guna menemukan algoritma mana yang paling akurat dalam memberikan suatu keputusan kredit, dengan melakukan perbandingan terhadap algoritma XGBoost dan algoritma Random Forest. Pada kedua algoritma digunakan data berukuran 10.000 dan 100.000 dengan 19 variabel yang relevan dalam pengambilan keputusan kartu kredit. Proses penelitian ini melibatkan pre-processing data, splitting data, training data, parameter tuning dengan Random Search, testing data, serta evaluasi model dengan confusion matrix. Hasil eksperimen menunjukkan bahwa kedua algoritma menghasilkan kinerja model yang cukup kompetitif, dimana XGBoost mampu mencapai 1.0 untuk semua metrik evaluasi baik pada data berukuran 10.000 maupun data berukuran 100.000. Random Forest sendiri berakurasi 0.998 untuk data berukuran 10.000 dan 0.999 untuk data berukuran 100.000. Akan tetapi, Random Forest hanya mampu mencapai F1-score sebesar 0.700 untuk data berukuran 10.000. Berdasarkan hasil yang diperoleh dalam penelitian ini, dapat disimpulkan bahwa kedua algoritma memiliki performa yang sangat baik dan akurat dalam mengklasifikasikan keputusan pada data kartu kredit. Namun, Random Forest kurang akurat bila digunakan pada data berukuran kecil yang tidak seimbang

    Feature selection in credit risk modeling: an international evidence

    Get PDF
    This paper aims to discover a suitable combination of contemporary feature selection techniques and robust prediction classifiers. As such, to examine the impact of the feature selection method on classifier performance, we use two Chinese and three other real-world credit scoring datasets. The utilized feature selection methods are the least absolute shrinkage and selection operator (LASSO), multivariate adaptive regression splines (MARS). In contrast, the examined classifiers are the classification and regression trees (CART), logistic regression (LR), artificial neural network (ANN), and support vector machines (SVM). Empirical findings confirm that LASSO’s feature selection method, followed by robust classifier SVM, demonstrates remarkable improvement and outperforms other competitive classifiers. Moreover, ANN also offers improved accuracy with feature selection methods; LR only can improve classification efficiency through performing feature selection via LASSO. Nonetheless, CART does not provide any indication of improvement in any combination. The proposed credit scoring modeling strategy may use to develop policy, progressive ideas, operational guidelines for effective credit risk management of lending, and other financial institutions. The finding of this study has practical value, as to date, there is no consensus about the combination of feature selection method and prediction classifiers

    Intrusion detection by machine learning = Behatolás detektálás gépi tanulás által

    Get PDF
    Since the early days of information technology, there have been many stakeholders who used the technological capabilities for their own benefit, be it legal operations, or illegal access to computational assets and sensitive information. Every year, businesses invest large amounts of effort into upgrading their IT infrastructure, yet, even today, they are unprepared to protect their most valuable assets: data and knowledge. This lack of protection was the main reason for the creation of this dissertation. During this study, intrusion detection, a field of information security, is evaluated through the use of several machine learning models performing signature and hybrid detection. This is a challenging field, mainly due to the high velocity and imbalanced nature of network traffic. To construct machine learning models capable of intrusion detection, the applied methodologies were the CRISP-DM process model designed to help data scientists with the planning, creation and integration of machine learning models into a business information infrastructure, and design science research interested in answering research questions with information technology artefacts. The two methodologies have a lot in common, which is further elaborated in the study. The goals of this dissertation were two-fold: first, to create an intrusion detector that could provide a high level of intrusion detection performance measured using accuracy and recall and second, to identify potential techniques that can increase intrusion detection performance. Out of the designed models, a hybrid autoencoder + stacking neural network model managed to achieve detection performance comparable to the best models that appeared in the related literature, with good detections on minority classes. To achieve this result, the techniques identified were synthetic sampling, advanced hyperparameter optimization, model ensembles and autoencoder networks. In addition, the dissertation set up a soft hierarchy among the different detection techniques in terms of performance and provides a brief outlook on potential future practical applications of network intrusion detection models as well

    Large Area Land Cover Mapping Using Deep Neural Networks and Landsat Time-Series Observations

    Get PDF
    This dissertation focuses on analysis and implementation of deep learning methodologies in the field of remote sensing to enhance land cover classification accuracy, which has important applications in many areas of environmental planning and natural resources management. The first manuscript conducted a land cover analysis on 26 Landsat scenes in the United States by considering six classifier variants. An extensive grid search was conducted to optimize classifier parameters using only the spectral components of each pixel. Results showed no gain in using deep networks by using only spectral components over conventional classifiers, possibly due to the small reference sample size and richness of features. The effect of changing training data size, class distribution, or scene heterogeneity were also studied and we found all of them having significant effect on classifier accuracy. The second manuscript reviewed 103 research papers on the application of deep learning methodologies in remote sensing, with emphasis on per-pixel classification of mono-temporal data and utilizing spectral and spatial data dimensions. A meta-analysis quantified deep network architecture improvement over selected convolutional classifiers. The effect of network size, learning methodology, input data dimensionality and training data size were also studied, with deep models providing enhanced performance over conventional one using spectral and spatial data. The analysis found that input dataset was a major limitation and available datasets have already been utilized to their maximum capacity. The third manuscript described the steps to build the full environment for dataset generation based on Landsat time-series data using spectral, spatial, and temporal information available for each pixel. A large dataset containing one sample block from each of 84 ecoregions in the conterminous United States (CONUS) was created and then processed by a hybrid convolutional+recurrent deep network, and the network structure was optimized with thousands of simulations. The developed model achieved an overall accuracy of 98% on the test dataset. Also, the model was evaluated for its overall and per-class performance under different conditions, including individual blocks, individual or combined Landsat sensors, and different sequence lengths. The analysis found that although the deep model performance per each block is superior to other candidates, the per block performance still varies considerably from block to block. This suggests extending the work by model fine-tuning for local areas. The analysis also found that including more time stamps or combining different Landsat sensor observations in the model input significantly enhances the model performance

    Study of Banking Customers Credit Scoring Indicators Using Artificial Intelligence and Delphi Method

    Get PDF
    With the importance of lending in the banking industry, it is very important to use the indicators affecting credit to decide on lending. The purpose of the present study is to identify and prioritize the effective features in customer accreditation using the viewpoints of bank experts in Kerman and to compare them with existing indicators in models extracted from Meta-Heuristic and Artificial Intelligence methods. The aim is to find out whether there is a match between the human views that arise from knowledge and experience and the views of artificial intelligence that look at the problem as black-box modeling. Required data were collected by questionnaire method and Quantum Binary particle swarm optimization algorithm and analyzed by Delphi. The results show that the selected indices have 80% overlap between the two methods. Due to the results of research and high accuracy of artificial intelligence techniques, it is suggested that in order to give credit to customers in banks and financial and credit institutions, to consider a higher weight for these indicators
    corecore