1,066 research outputs found

    Transfer Learning with Label Adaptation for Counterparty Rating Prediction

    Get PDF
    Credit rating is one of the core tools for risk management within financial firms. Ratings are usually provided by specialized agencies which perform an overall study and diagnosis on a given firm’s financial health. Dealing with unrated entities is a common problem, as several risk models rely on the ratings’ completeness, and agencies can not realistically rate every existing company. To solve this, credit rating prediction has been widely studied in academia. However, research in this topic tends to separate models amongst the different rating agencies due to the difference in both rating scales and composition. This work uses transfer learning, via label adaptation, to increase the number of samples for feature selection, and appends these adapted labels as an additional feature to improve the predictive power and stability of previously proposed methods. Accuracy on exact label prediction was improved from 0.30, in traditional models, up to 0.33 in the transfer learning setting. Furthermore, when measuring accuracy with a tolerance of 3 grade notches, accuracy increased almost 0.10, from 0.87 to 0.96. Overall, transfer learning displayed better out-of-sample generalization

    Information gain directed genetic algorithm wrapper feature selection for credit rating

    Get PDF
    Financial credit scoring is one of the most crucial processes in the finance industry sector to be able to assess the credit-worthiness of individuals and enterprises. Various statistics-based machine learning techniques have been employed for this task. “Curse of Dimensionality” is still a significant challenge in machine learning techniques. Some research has been carried out on Feature Selection (FS) using genetic algorithm as wrapper to improve the performance of credit scoring models. However, the challenge lies in finding an overall best method in credit scoring problems and improving the time-consuming process of feature selection. In this study, the credit scoring problem is investigated through feature selection to improve classification performance. This work proposes a novel approach to feature selection in credit scoring applications, called as Information Gain Directed Feature Selection algorithm (IGDFS), which performs the ranking of features based on information gain, propagates the top m features through the GA wrapper (GAW) algorithm using three classical machine learning algorithms of KNN, Naïve Bayes and Support Vector Machine (SVM) for credit scoring. The first stage of information gain guided feature selection can help reduce the computing complexity of GA wrapper, and the information gain of features selected with the IGDFS can indicate their importance to decision making

    Data mining in computational finance

    Get PDF
    Computational finance is a relatively new discipline whose birth can be traced back to early 1950s. Its major objective is to develop and study practical models focusing on techniques that apply directly to financial analyses. The large number of decisions and computationally intensive problems involved in this discipline make data mining and machine learning models an integral part to improve, automate, and expand the current processes. One of the objectives of this research is to present a state-of-the-art of the data mining and machine learning techniques applied in the core areas of computational finance. Next, detailed analysis of public and private finance datasets is performed in an attempt to find interesting facts from data and draw conclusions regarding the usefulness of features within the datasets. Credit risk evaluation is one of the crucial modern concerns in this field. Credit scoring is essentially a classification problem where models are built using the information about past applicants to categorise new applicants as ‘creditworthy’ or ‘non-creditworthy’. We appraise the performance of a few classical machine learning algorithms for the problem of credit scoring. Typically, credit scoring databases are large and characterised by redundant and irrelevant features, making the classification task more computationally-demanding. Feature selection is the process of selecting an optimal subset of relevant features. We propose an improved information-gain directed wrapper feature selection method using genetic algorithms and successfully evaluate its effectiveness against baseline and generic wrapper methods using three benchmark datasets. One of the tasks of financial analysts is to estimate a company’s worth. In the last piece of work, this study predicts the growth rate for earnings of companies using three machine learning techniques. We employed the technique of lagged features, which allowed varying amounts of recent history to be brought into the prediction task, and transformed the time series forecasting problem into a supervised learning problem. This work was applied on a private time series dataset

    Image similarity in medical images

    Get PDF
    Recent experiments have indicated a strong influence of the substrate grain orientation on the self-ordering in anodic porous alumina. Anodic porous alumina with straight pore channels grown in a stable, self-ordered manner is formed on (001) oriented Al grain, while disordered porous pattern is formed on (101) oriented Al grain with tilted pore channels growing in an unstable manner. In this work, numerical simulation of the pore growth process is carried out to understand this phenomenon. The rate-determining step of the oxide growth is assumed to be the Cabrera-Mott barrier at the oxide/electrolyte (o/e) interface, while the substrate is assumed to determine the ratio β between the ionization and oxidation reactions at the metal/oxide (m/o) interface. By numerically solving the electric field inside a growing porous alumina during anodization, the migration rates of the ions and hence the evolution of the o/e and m/o interfaces are computed. The simulated results show that pore growth is more stable when β is higher. A higher β corresponds to more Al ionized and migrating away from the m/o interface rather than being oxidized, and hence a higher retained O:Al ratio in the oxide. Experimentally measured oxygen content in the self-ordered porous alumina on (001) Al is indeed found to be about 3% higher than that in the disordered alumina on (101) Al, in agreement with the theoretical prediction. The results, therefore, suggest that ionization on (001) Al substrate is relatively easier than on (101) Al, and this leads to the more stable growth of the pore channels on (001) Al

    Abordagem CNN 2D estendida para o diagnóstico da doença de Alzheimer através de imagens de ressonância magnética estrutural

    Get PDF
    Orientadores: Leticia Rittner, Roberto de Alencar LotufoDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A doença de Alzheimer (AD - Alzheimer's disease) é um tipo de demência que afeta milhões de pessoas em todo o mundo. Até o momento, não há cura para a doença e seu diagnóstico precoce tem sido uma tarefa desafiadora. As técnicas atuais para o seu diagnóstico têm explorado as informações estruturais da Imagem por Ressonância Magnética (MRI - Magnetic Resonance Imaging) em imagens ponderadas em T1. Entre essas técnicas, a rede neural convolucional (CNN - Convolutional Neural Network) é a mais promissora e tem sido usada com sucesso em imagens médicas para uma variedade de aplicações devido à sua capacidade de extração de características. Antes do grande sucesso do aprendizado profundo e das CNNs, os trabalhos que objetivavam classificar os diferentes estágios de AD exploraram abordagens clássicas de aprendizado de máquina e uma meticulosa extração de características, principalmente para classificar testes binários. Recentemente, alguns autores combinaram técnicas de aprendizagem profunda e pequenos subconjuntos do conjunto de dados públicos da Iniciativa de Neuroimagem da Doença de Alzheimer (ADNI - Alzheimer's Disease Neuroimaging Initiative) para prever um estágio inicial da doença explorando abordagens 3D CNN geralmente combinadas com arquiteturas de auto-codificador convolucional 3D. Outros também exploraram uma abordagem de CNN 3D combinando-a ou não com uma etapa de pré-processamento para a extração de características. No entanto, a maioria desses trabalhos focam apenas na classificação binária, sem resultados para AD, comprometimento cognitivo leve (MCI - Mild Cognitive Impairment) e classificação de sujeitos normais (NC - Normal Control). Nosso principal objetivo foi explorar abordagens de CNN 2D para a tarefa de classificação das 3 classes usando imagens de MRI ponderadas em T1. Como objetivo secundário, preenchemos algumas lacunas encontradas na literatura ao investigar o uso de arquiteturas CNN 2D para o nosso problema, uma vez que a maioria dos trabalhos explorou o aprendizado de máquina clássico ou abordagens CNN 3D. Nossa abordagem CNN 2D estendida explora as informações volumétricas dos dados de ressonância magnética, mantendo baixo custo computacional associado a uma abordagem 2D, quando comparados às abordagens 3D. Além disso, nosso resultado supera as outras estratégias para a classificação das 3 classes e comparando o desempenho de nosso modelo com os métodos tradicionais de aprendizado de máquina e 3D CNN. Também investigamos o papel de diferentes técnicas amplamente utilizadas em aplicações CNN, por exemplo, pré-processamento de dados, aumento de dados, transferência de aprendizado e adaptação de domínio para um conjunto de dados brasileiroAbstract: Alzheimer's disease (AD) is a type of dementia that affects millions of people around the world. To date, there is no cure for Alzheimer's and its early-diagnosis has been a challenging task. The current techniques for Alzheimer's disease diagnosis have explored the structural information of Magnetic Resonance Imaging (MRI) in T1-weighted images. Among these techniques, deep convolutional neural network (CNN) is the most promising one and has been successfully used in medical images for a variety of applications due to its ability to perform features extraction. Before the great success of deep learning and CNNs, the works that aimed to classify the different stages of AD explored classic machine learning approaches and a meticulous feature engineering extraction, mostly to classify binary tasks. Recently, some authors have combined deep learning techniques and small subsets from the Alzheimer's Disease Neuroimaging Initiative (ADNI) public dataset, to predict an early-stage of AD exploring 3D CNN approaches usually combined with 3D convolutional autoencoder architectures. Others have also investigated a 3D CNN approach combining it or not with a pre-processing step for the extraction of features. However, the majority of these papers focus on binary classification only, with no results for Alzheimer's disease, Mild Cognitive Impairment (MCI), and Normal Control (NC) classification. Our primary goal was to explore 2D CNN approaches to tackle the 3-class classification using T1-weighted MRI. As a secondary goal, we filled some gaps we found in the literature by investigating the use of 2D CNN architectures to our problem, since most of the works either explored traditional machine learning or 3D CNN approaches. Our extended-2D CNN explores the MRI volumetric data information while maintaining the low computational costs associated with a 2D approach when compared to 3D-CNNs. Besides, our result overcomes the other strategies for the 3-class classification while analyzing the performance of our model with traditional machine-learning and 3D-CNN methods. We also investigated the role of different widely used techniques in CNN applications, for instance, data pre-processing, data augmentation, transfer-learning, and domain-adaptation to a Brazilian datasetMestradoEngenharia de ComputaçãoMestra em Engenharia Elétrica168468/2017-4  CNP

    Image similarity in medical images

    Get PDF
    corecore