3,115 research outputs found

    Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

    Full text link
    Feature selection (FS) has become an indispensable task in dealing with today's highly complex pattern recognition problems with massive number of features. In this study, we propose a new wrapper approach for FS based on binary simultaneous perturbation stochastic approximation (BSPSA). This pseudo-gradient descent stochastic algorithm starts with an initial feature vector and moves toward the optimal feature vector via successive iterations. In each iteration, the current feature vector's individual components are perturbed simultaneously by random offsets from a qualified probability distribution. We present computational experiments on datasets with numbers of features ranging from a few dozens to thousands using three widely-used classifiers as wrappers: nearest neighbor, decision tree, and linear support vector machine. We compare our methodology against the full set of features as well as a binary genetic algorithm and sequential FS methods using cross-validated classification error rate and AUC as the performance criteria. Our results indicate that features selected by BSPSA compare favorably to alternative methods in general and BSPSA can yield superior feature sets for datasets with tens of thousands of features by examining an extremely small fraction of the solution space. We are not aware of any other wrapper FS methods that are computationally feasible with good convergence properties for such large datasets.Comment: This is the Istanbul Sehir University Technical Report #SHR-ISE-2016.01. A short version of this report has been accepted for publication at Pattern Recognition Letter

    Credit risk modeling: A comparative analysis of artificial and deep neural networks

    Get PDF
    Credit risk assessment plays a major role in the banks and financial institutions to prevent counterparty risk failure. One of the primary capabilities of a robust risk management system must be detecting the risks earlier, though many of the bank systems today lack this key capability which leads to further losses (MGI, 2017). In searching for an improved methodology to detect such credit risk and increasing the lacking capabilities earlier, a comparative analysis between Deep Neural Network (DNN) and machine learning techniques such as Support Vector Machines (SVM), K-Nearest Neighbours (KNN) and Artificial Neural Network (ANN) were conducted. The Deep Neural Network used in this study consists of six layers of neurons. Further, sampling techniques such as SMOTE, SVM-SMOTE, RUS, and All-KNN to make the imbalanced dataset a balanced one were also applied. Using supervised learning techniques, the proposed DNN model was able to achieve an accuracy of 82.18% with a ROC score of 0.706 using the RUS sampling technique. The All KNN sampling technique was capable of achieving the maximum true positives in two different models. Using the proposed approach, banks and credit check institutions can help prevent major losses occurring due to counterparty risk failure.credit riskdeep neural networkartificial neural networksupport vector machinessampling technique

    The Classification Performance of Multiple Methods and Datasets: Cases from the Loan Credit Scoring Domain

    Get PDF
    Decisions to extend credit to potential customers are complex, risky and even potentially catastrophic for the credit granting institution and the broader economy as underscored by credit failures in the late 2000s. Thus, the ability to accurately assess the likelihood of default is an important issue. In this paper the authors contrast the classification accuracy of multiple computational intelligence methods using five datasets obtained from five different decision contexts in the real world. The methods considered are: logistic regression (LR), neural network (NN), radial basis function neural network (RBFNN), support vector machine (SVM), k-nearest neighbor (kNN), and decision tree (DT). The datasets have various characteristics with respect to the number of cases, the number and type of attributes, the extent of missing values as well as different ratios for bad loans/good loans. Using areas under ROC charts as well as the classification accuracy rates for overall, bad loans, and good loans the performances of six methods across five datasets and the five datasets across the methods are examined to find if there are significant differences between the methods and datasets. Our results reveal some interesting findings which may be useful to practitioners. Even though no method consistently outperformed any other method using the above metrics on all datasets, this study provides some guidelines as to the most appropriate methods suitable for each specific data set. In addition, the study finds that customer financial attributes are much more relevant than the personal, social, or employment attributes for predictive accuracy
    • …
    corecore