707 research outputs found

    A combined data mining approach using rough set theory and case-based reasoning in medical datasets

    Get PDF
    Case-based reasoning (CBR) is the process of solving new cases by retrieving the most relevant ones from an existing knowledge-base. Since, irrelevant or redundant features not only remarkably increase memory requirements but also the time complexity of the case retrieval, reducing the number of dimensions is an issue worth considering. This paper uses rough set theory (RST) in order to reduce the number of dimensions in a CBR classifier with the aim of increasing accuracy and efficiency. CBR exploits a distance based co-occurrence of categorical data to measure similarity of cases. This distance is based on the proportional distribution of different categorical values of features. The weight used for a feature is the average of co-occurrence values of the features. The combination of RST and CBR has been applied to real categorical datasets of Wisconsin Breast Cancer, Lymphography, and Primary cancer. The 5-fold cross validation method is used to evaluate the performance of the proposed approach. The results show that this combined approach lowers computational costs and improves performance metrics including accuracy and interpretability compared to other approaches developed in the literature

    A literature review on the application of evolutionary computing to credit scoring

    Get PDF
    The last years have seen the development of many credit scoring models for assessing the creditworthiness of loan applicants. Traditional credit scoring methodology has involved the use of statistical and mathematical programming techniques such as discriminant analysis, linear and logistic regression, linear and quadratic programming, or decision trees. However, the importance of credit grant decisions for financial institutions has caused growing interest in using a variety of computational intelligence techniques. This paper concentrates on evolutionary computing, which is viewed as one of the most promising paradigms of computational intelligence. Taking into account the synergistic relationship between the communities of Economics and Computer Science, the aim of this paper is to summarize the most recent developments in the application of evolutionary algorithms to credit scoring by means of a thorough review of scientific articles published during the period 2000–2012.This work has partially been supported by the Spanish Ministry of Education and Science under grant TIN2009-14205 and the Generalitat Valenciana under grant PROMETEO/2010/028

    Credit risk evaluation modeling using evolutionary linear SVM classifiers and sliding window approach

    Get PDF
    AbstractThis paper presents a study on credit risk evaluation modeling using linear Support Vector Machines (SVM) classifiers, combined with evolutionary parameter selection using Genetic Algorithms and Particle Swarm Optimization, and sliding window approach. Discriminant analysis was applied for evaluation of financial instances and dynamic formation of bankruptcy classes. The possibilities of feature selection application were also researched by applying correlation-based feature subset evaluator. The research demonstrates a possibility to develop and apply an intelligent classifier based on original discriminant analysis method evaluation and shows that it might perform bankruptcy identification better than original model

    The Impacts of Machine Learning in Financial Crisis Prediction

    Get PDF
    The most complicated and expected issue to be handled in corporate firms, small-scale businesses, and investors’ even governments are financial crisis prediction. To this effect, it was of interest to us to investigate the current impact of the newly employed technique that is machine learning (ML) to handle this menace in all spheres of business both private and public. The study uses systematic literature assessment to study the impact of ML in financial crisis prediction. From the selected works of literature, we have been able to establish the important role play by this method in the prediction of bankruptcy and creditworthiness that was not handled appropriately by others method. Also, machine learning helps in data handling, data privacy, and confidentiality. This study presents a leading approach to achieving financial growth and plasticity in corporate organizations. We, therefore, recommend a real-time study to investigate the impact of ML in FCP. &nbsp

    Ensemble classification of incomplete data – a non-imputation approach with an application in ovarian tumour diagnosis support

    Get PDF
    Wydział Matematyki i InformatykiW niniejszej pracy doktorskiej zająłem się problemem klasyfikacji danych niekompletnych. Motywacja do podjęcia badań ma swoje źródło w medycynie, gdzie bardzo często występuje zjawisko braku danych. Najpopularniejszą metodą radzenia sobie z tym problemem jest imputacja danych, będąca uzupełnieniem brakujących wartości na podstawie statystycznych zależności między cechami. W moich badaniach przyjąłem inną strategię rozwiązania tego problemu. Wykorzystując opracowane wcześniej klasyfikatory można przekształcić je do formy, która zwraca przedział możliwych predykcji. Następnie, poprzez zastosowanie operatorów agregacji oraz metod progowania, można dokonać finalnej klasyfikacji. W niniejszej pracy pokazuję jak dokonać ww. przekształcenia klasyfikatorów oraz jak wykorzystać strategie agregacji danych przedziałowych do klasyfikacji. Opracowane przeze mnie metody podnoszą jakość klasyfikacji danych niekompletnych w problemie wspomagania diagnostyki guzów jajnika. Dodatkowa analiza wyników na zewnętrznych zbiorach danych z repozytorium uczenia maszynowego Uniwersytetu Kalifornijskiego w Irvine (UCI) wskazuje, że przedstawione metody są komplementarne z imputacją.In this doctoral dissertation I focus on the problem of classification of incomplete data. The motivation for the research comes from medicine, where missing data phenomena are commonly encountered. The most popular method of dealing with data missingness is imputation; that is, inserting missing data on the basis of statistical relationships among features. In my research I choose a different strategy for dealing with this issue. Classifiers of a type previously developed can be transformed to a form which returns an interval of possible predictions. In the next step, with the use of aggregation operators and thresholding methods, one can make a final classification. I show how to make such transformations of classifiers and how to use aggregation strategies for interval data classification. These methods improve the quality of the process of classification of incomplete data in the problem of ovarian tumour diagnosis. Additional analysis carried out on external datasets from the University of California, Irvine (UCI) Machine Learning Repository shows that the aforementioned methods are complementary to imputation

    Decision Support Systems for Risk Assessment in Credit Operations Against Collateral

    Get PDF
    With the global economic crisis, which reached its peak in the second half of 2008, and before a market shaken by economic instability, financial institutions have taken steps to protect the banks’ default risks, which had an impact directly in the form of analysis in credit institutions to individuals and to corporate entities. To mitigate the risk of banks in credit operations, most banks use a graded scale of customer risk, which determines the provision that banks must do according to the default risk levels in each credit transaction. The credit analysis involves the ability to make a credit decision inside a scenario of uncertainty and constant changes and incomplete transformations. This ability depends on the capacity to logically analyze situations, often complex and reach a clear conclusion, practical and practicable to implement. Credit Scoring models are used to predict the probability of a customer proposing to credit to become in default at any given time, based on his personal and financial information that may influence the ability of the client to pay the debt. This estimated probability, called the score, is an estimate of the risk of default of a customer in a given period. This increased concern has been in no small part caused by the weaknesses of existing risk management techniques that have been revealed by the recent financial crisis and the growing demand for consumer credit.The constant change affects several banking sections because it prevents the ability to investigate the data that is produced and stored in computers that are too often dependent on manual techniques. Among the many alternatives used in the world to balance this risk, the provision of guarantees stands out of guarantees in the formalization of credit agreements. In theory, the collateral does not ensure the credit return, as it is not computed as payment of the obligation within the project. There is also the fact that it will only be successful if triggered, which involves the legal area of the banking institution. The truth is, collateral is a mitigating element of credit risk. Collaterals are divided into two types, an individual guarantee (sponsor) and the asset guarantee (fiduciary). Both aim to increase security in credit operations, as an payment alternative to the holder of credit provided to the lender, if possible, unable to meet its obligations on time. For the creditor, it generates liquidity security from the receiving operation. The measurement of credit recoverability is a system that evaluates the efficiency of the collateral invested return mechanism. In an attempt to identify the sufficiency of collateral in credit operations, this thesis presents an assessment of smart classifiers that uses contextual information to assess whether collaterals provide for the recovery of credit granted in the decision-making process before the credit transaction become insolvent. The results observed when compared with other approaches in the literature and the comparative analysis of the most relevant artificial intelligence solutions, considering the classifiers that use guarantees as a parameter to calculate the risk contribute to the advance of the state of the art advance, increasing the commitment to the financial institutions.Com a crise econômica global, que atingiu seu auge no segundo semestre de 2008, e diante de um mercado abalado pela instabilidade econômica, as instituições financeiras tomaram medidas para proteger os riscos de inadimplência dos bancos, medidas que impactavam diretamente na forma de análise nas instituições de crédito para pessoas físicas e jurídicas. Para mitigar o risco dos bancos nas operações de crédito, a maioria destas instituições utiliza uma escala graduada de risco do cliente, que determina a provisão que os bancos devem fazer de acordo com os níveis de risco padrão em cada transação de crédito. A análise de crédito envolve a capacidade de tomar uma decisão de crédito dentro de um cenário de incerteza e mudanças constantes e transformações incompletas. Essa aptidão depende da capacidade de analisar situações lógicas, geralmente complexas e de chegar a uma conclusão clara, prática e praticável de implementar. Os modelos de Credit Score são usados para prever a probabilidade de um cliente propor crédito e tornar-se inadimplente a qualquer momento, com base em suas informações pessoais e financeiras que podem influenciar a capacidade do cliente de pagar a dívida. Essa probabilidade estimada, denominada pontuação, é uma estimativa do risco de inadimplência de um cliente em um determinado período. A mudança constante afeta várias seções bancárias, pois impede a capacidade de investigar os dados que são produzidos e armazenados em computadores que frequentemente dependem de técnicas manuais. Entre as inúmeras alternativas utilizadas no mundo para equilibrar esse risco, destacase o aporte de garantias na formalização dos contratos de crédito. Em tese, a garantia não “garante” o retorno do crédito, já que não é computada como pagamento da obrigação dentro do projeto. Tem-se ainda, o fato de que esta só terá algum êxito se acionada, o que envolve a área jurídica da instituição bancária. A verdade é que, a garantia é um elemento mitigador do risco de crédito. As garantias são divididas em dois tipos, uma garantia individual (patrocinadora) e a garantia do ativo (fiduciário). Ambos visam aumentar a segurança nas operações de crédito, como uma alternativa de pagamento ao titular do crédito fornecido ao credor, se possível, não puder cumprir suas obrigações no prazo. Para o credor, gera segurança de liquidez a partir da operação de recebimento. A mensuração da recuperabilidade do crédito é uma sistemática que avalia a eficiência do mecanismo de retorno do capital investido em garantias. Para tentar identificar a suficiência das garantias nas operações de crédito, esta tese apresenta uma avaliação dos classificadores inteligentes que utiliza informações contextuais para avaliar se as garantias permitem prever a recuperação de crédito concedido no processo de tomada de decisão antes que a operação de crédito entre em default. Os resultados observados quando comparados com outras abordagens existentes na literatura e a análise comparativa das soluções de inteligência artificial mais relevantes, mostram que os classificadores que usam garantias como parâmetro para calcular o risco contribuem para o avanço do estado da arte, aumentando o comprometimento com as instituições financeiras

    Forecasting Financial Distress With Machine Learning – A Review

    Get PDF
    Purpose – Evaluate the various academic researches with multiple views on credit risk and artificial intelligence (AI) and their evolution.Theoretical framework – The study is divided as follows: Section 1 introduces the article. Section 2 deals with credit risk and its relationship with computational models and techniques. Section 3 presents the methodology. Section 4 addresses a discussion of the results and challenges on the topic. Finally, section 5 presents the conclusions.Design/methodology/approach – A systematic review of the literature was carried out without defining the time period and using the Web of Science and Scopus database.Findings – The application of computational technology in the scope of credit risk analysis has drawn attention in a unique way. It was found that the demand for identification and introduction of new variables, classifiers and more assertive methods is constant. The effort to improve the interpretation of data and models is intense.Research, Practical & Social implications – It contributes to the verification of the theory, providing information in relation to the most used methods and techniques, it brings a wide analysis to deepen the knowledge of the factors and variables on the theme. It categorizes the lines of research and provides a summary of the literature, which serves as a reference, in addition to suggesting future research.Originality/value – Research in the area of Artificial Intelligence and Machine Learning is recent and requires attention and investigation, thus, this study contributes to the opening of new views in order to deepen the work on this topic

    BENCHMARKING CLASSIFIERS - HOW WELL DOES A GOWA-VARIANT OF THE SIMILARITY CLASSIFIER DO IN COMPARISON WITH SELECTED CLASSIFIERS?

    Get PDF
    Digital data is ubiquitous in nearly all modern businesses. Organizations have more data available, in various formats, than ever before. Machine learning algorithms and predictive analytics utilize the knowledge contained in that data, in order to help the business related decision-making. This study explores predictive analytics by comparing different classification methods – the main interest being in the Generalize Ordered Weighted Average (GOWA)-variant of the similarity classifier. The target for this research is to find out how what is the GOWA-variant of the similarity classifier and how well it performs compared to other selected classifiers. This study also tries to investigate whether the GOWA-variant of the similarity classifier is a sufficient method to be used in the busi-ness related decision-making. Four different classical classifiers were selected as reference classifiers on the basis of their common usage in machine learning research, and on their availability in the Sta-tistics and Machine Learning Toolbox in MATLAB. Three different data sets from UCI Machine Learning repository were used for benchmarking the classifiers. The benchmarking process uses fitness function instead of pure classification accuracy to determine the performance of the classifiers. Fitness function combines several measurement criteria into a one common value. With one data set, the GOWA-variant of the similarity classifier per-formed the best. One of the data sets contains credit card client data. It was more complex than the other two data sets and contains clearly business related data. The GOWA-variant performed also well with this data set. Therefore it can be claimed that the GOWA-variant of the similarity classifi-er is a viable option to be used also for solving business related problems
    • …
    corecore