4,992 research outputs found

    CREDIT SCORING USING LOGISTIC REGRESSION

    Get PDF
    This report presents an approach to predict the credit scores of customers using the Logistic Regression machine learning algorithm. The research objective of this project is to perform a comparative study between feature selection and feature extraction, against the same dataset using the Logistic Regression machine learning algorithm. For feature selection, we have used Stepwise Logistic Regression. For feature extraction, we have used Singular Value Decomposition (SVD) and Weighted Singular Value Decomposition (SVD). In order to test the accuracy obtained using feature selection and feature extraction, we used a public credit dataset having 11 features and 150,000 records. After performing feature reduction, Logistic Regression algorithm was used for classification. In our results, we observed that Stepwise Logistic Regression gave a 14% increase in accuracy as compared to Singular Value Decomposition (SVD) and a 10% increase in accuracy as compared to Weighted Singular Value Decomposition (SVD). Thus, we can conclude that Stepwise Logistic Regression performed significantly better than both Singular Value Decomposition (SVD) and Weighted Singular Value Decomposition (SVD). The benefit of using feature selection was that it helped us in identifying important features, which improved the prediction accuracy of the classifier

    Soft computing techniques applied to finance

    Get PDF
    Soft computing is progressively gaining presence in the financial world. The number of real and potential applications is very large and, accordingly, so is the presence of applied research papers in the literature. The aim of this paper is both to present relevant application areas, and to serve as an introduction to the subject. This paper provides arguments that justify the growing interest in these techniques among the financial community and introduces domains of application such as stock and currency market prediction, trading, portfolio management, credit scoring or financial distress prediction areas.Publicad

    The role of textual data in finance: methodological issues and empirical evidence

    Get PDF
    This thesis investigates the role of textual data in the financial field. Textual data fall into the more extensive category of alternative data. These types of data, such as reviews, blog post, tweet, are constantly growing, and this reinforces the importance in several domains. The thesis explores different applications of textual data in finance to highlight how it is possible to use this type of data and how this implementation can add value to financial analysis. The first application concerns the use of a lexicon-based approach in the credit scoring model. The second application proposes a causality detection between financial and sentiment data using an information-theoretic measure, the transfer entropy. The last application concerns the use of sentiment analysis in a network model, called BGVAR, to analyze the financial impact of the Covid-19 Pandemic. Overall, this thesis shows that combining textual data with traditional financial data can lead to a more insightful knowledge and, therefore, to a more in-depth analysis, allowing for a broader understanding of economic events and financial relationships among economic entities of any kind

    Issues in predictive modeling of individual customer behavior : applications in targeted marketing and consumer credit scoring

    Get PDF

    A Semi-Supervised Feature Engineering Method for Effective Outlier Detection in Mixed Attribute Data Sets

    Get PDF
    Outlier detection is one of the crucial tasks in data mining which can lead to the finding of valuable and meaningful information within the data. An outlier is a data point that is notably dissimilar from other data points in the data set. As such, the methods for outlier detection play an important role in identifying and removing the outliers, thereby increasing the performance and accuracy of the prediction systems. Outlier detection is used in many areas like financial fraud detection, disease prediction, and network intrusion detection. Traditional outlier detection methods are founded on the use of different distance measures to estimate the similarity between the points and are confined to data sets that are purely continuous or categorical. These methods, though effective, lack in elucidating the relationship between outliers and known clusters/classes in the data set. We refer to this relationship as the context for any reported outlier. Alternate outlier detection methods establish the context of a reported outlier using underlying contextual beliefs of the data. Contextual beliefs are the established relationships between the attributes of the data set. Various studies have been recently conducted where they explore the contextual beliefs to determine outlier behavior. However, these methods do not scale in the situations where the data points and their respective contexts are sparse. Thus, the outliers reported by these methods tend to lose meaning. Another limitation of these methods is that they assume all features are equally important and do not consider nor determine subspaces among the features for identifying the outliers. Furthermore, determining subspaces is computationally exacerbated, as the number of possible subspaces increases with increasing dimensionality. This makes searching through all the possible subspaces impractical. In this thesis, we propose a Hybrid Bayesian Network approach to capture the underlying contextual beliefs to detect meaningful outliers in mixed attribute data sets. Hybrid Bayesian Networks utilize their probability distributions to encode the information of the data and outliers are those points which violate this information. To deal with the sparse contexts, we use an angle-based similarity method which is then combined with the joint probability distributions of the Hybrid Bayesian Network in a robust manner. With regards to the subspace selection, we employ a feature engineering method that consists of two-stage feature selection using Maximal Information Coefficient and Markov blankets of Hybrid Bayesian Networks to select highly correlated feature subspaces. This proposed method was tested on a real world medical record data set. The results indicate that the algorithm was able to identify meaningful outliers successfully. Moreover, we compare the performance of our algorithm with the existing baseline outlier detection algorithms. We also present a detailed analysis of the reported outliers using our method and demonstrate its efficiency when handling data points with sparse contexts

    A Review of Algorithms for Credit Risk Analysis

    Get PDF
    The interest collected by the main borrowers is collected to pay back the principal borrowed from the depositary bank. In financial risk management, credit risk assessment is becoming a significant sector. For the credit risk assessment of client data sets, many credit risk analysis methods are used. The assessment of the credit risk datasets leads to the choice to cancel the customer\u27s loan or to dismiss the customer\u27s request is a challenging task involving a profound assessment of the information set or client information. In this paper, we survey diverse automatic credit risk analysis methods used for credit risk assessment. Data mining approach, as the most often used approach for credit risk analysis was described with the focus to various algorithms, such as neural networks. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.</p
    • …
    corecore