5 research outputs found

    A Method Non-Deterministic and Computationally Viable for Detecting Outliers in Large Datasets

    Get PDF
    This paper presents an outlier detection method that is based on a Variable Precision Rough Set Model (VPRSM). This method generalizes the standard set inclusion relation, which is the foundation of the Rough Sets Basic Model (RSBM). The main contribution of this research is an improvement in the quality of detection because this generalization allows us to classify when there is some degree of uncertainty. From the proposed method, a computationally viable algorithm for large volumes of data is also introduced. The experiments performed in a real scenario and a comparison of the results with the RSBM-based method demonstrate the efficiency of both the method and the algorithm in diverse contexts that involve large volumes of data.This work has been supported by grant TIN2016-78103-C2-2-R, and University of Alicante projects GRE14-02 and Smart University

    Credit scoring using three-way decisions with probabilistic rough sets

    No full text
    Credit scoring is a crucial task within risk management for any company in the financial sector. On the one hand, it is in the self-interest of banks to avoid approving credits to customers who probably default. On the other hand, regulators require strict risk management systems from banks to protect their customers and, from “too big to fail institutions”, to avoid bankruptcy with negative impacts on an economy as a whole. However, credit scoring is also expensive and time-consuming. So, any possible method, like three-way decisions, to further increase its efficiency, is worth a try. We propose a two-step approach based on three-way decisions. Customers whose credit applications can be approved or rejected right away are decided in a first step. For the remaining credit applications, additional information is gathered in a second step. Hence, these decisions are more expensive than the ones in the first step. In our paper, we present a methodology to apply three-way decisions with probabilistic rough sets for credit scoring and an extensive case study with more than 7000 credit applications from Chilean micro-enterprises

    Diseño de una estrategia para la gestión de cobranza, a través de Big Data Analytics en empresas de venta por catálogo

    Get PDF
    Este trabajo del MBA presenta un diseño de la estrategia de gestión de cobranza a través de Big Data Analytics, en empresas de Venta por Catálogo. La metodología empleada para lograr el objetivo del trabajo consiste en una revisión sistemática de literatura y un análisis cualitativo, con el fin de identificar, describir, profundizar y finalmente divulgar la estrategia. Es así como el método de investigación se empleó de la siguiente forma: (1) Planear el protocolo de revisión; (2) Identificación y clasificación de literatura orientada al objeto de estudio; (3) Descripción de literatura de la evolución del objeto de estudio, y (4) finalmente, la entrega de resultados. Como resultado se diseña la estrategia de gestión de cobranza a través de Big Data Analytics en empresas de Venta por Catálogo, en tres categorías enmarcadas en el ciclo de vigencia del crédito: (1) Otorgamiento del crédito, (2) Seguimiento al Comportamiento del uso del crédito y (3) Recuperación del crédito. Asimismo, el trabajo es un buen ejemplo de cómo emplear estas estrategias en empresas orientadas al desarrollo del canal comercial, lo cual asegura crecimiento, pero al mismo tiempo protege la estructura financiera de la empresa, pues permite segmentar los perfiles de los clientes, y genera estrategias customizadas de acuerdo con el riesgo, de tal forma que minimice la probabilidad de pérdida de la empresa. (la probabilidad de pérdidas que podría tener la empresa).The present study of MBA seeks to design a collection management strategy using Big Data Analytics implemented in Direct Selling Companies. The methodology used to achieve the main objective is based on a systematic literature review and a qualitative analysis, in order to identify, describe, deepen and communicate the final strategy. The research method was applied in four core activities: (1) Planning the review protocol ; (2) Identification and classification of literature approach to the object of study; (3) Literature description of the evolution of the study and finally (4) Results. As a result, the strategy design model applied in Collections management through Big Data Analytics in Direct Selling companies, the model design was divided into three categories focused on the credit cycle (1) Credit Granting, (2) Credit Use Behavior and (3) Credit Recovery. Likewise, this study is a good example of how to use these strategies in companies oriented to the development of the commercial channel, which ensure growth, but at the same time protect the financial structure of the company, by segmenting customer profiles, generating personalized strategies according to the risk, thus minimizing the possibility of loss of the company.Magíster en Administración MBAMaestrí

    Data Science for Finance: Targeted Learning from (Big) Data to Economic Stability and Financial Risk Management

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsThe modelling, measurement, and management of systemic financial stability remains a critical issue in most countries. Policymakers, regulators, and managers depend on complex models for financial stability and risk management. The models are compelled to be robust, realistic, and consistent with all relevant available data. This requires great data disclosure, which is deemed to have the highest quality standards. However, stressed situations, financial crises, and pandemics are the source of many new risks with new requirements such as new data sources and different models. This dissertation aims to show the data quality challenges of high-risk situations such as pandemics or economic crisis and it try to theorize the new machine learning models for predictive and longitudes time series models. In the first study (Chapter Two) we analyzed and compared the quality of official datasets available for COVID-19 as a best practice for a recent high-risk situation with dramatic effects on financial stability. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organizations based on the value of systematic measurement errors. We combined excel files, text mining techniques, and manual data entries to extract the COVID-19 data from official reports and to generate an accurate profile for comparisons. The findings show noticeable and increasing measurement errors in the three datasets as the pandemic outbreak expanded and more countries contributed data for the official repositories, raising data comparability concerns and pointing to the need for better coordination and harmonized statistical methods. The study offers a COVID-19 combined dataset and dashboard with minimum systematic measurement errors and valuable insights into the potential problems in using databanks without carefully examining the metadata and additional documentation that describe the overall context of data. In the second study (Chapter Three) we discussed credit risk as the most significant source of risk in banking as one of the most important sectors of financial institutions. We proposed a new machine learning approach for online credit scoring which is enough conservative and robust for unstable and high-risk situations. This Chapter is aimed at the case of credit scoring in risk management and presents a novel method to be used for the default prediction of high-risk branches or customers. This study uses the Kruskal-Wallis non-parametric statistic to form a conservative credit-scoring model and to study its impact on modeling performance on the benefit of the credit provider. The findings show that the new credit scoring methodology represents a reasonable coefficient of determination and a very low false-negative rate. It is computationally less expensive with high accuracy with around 18% improvement in Recall/Sensitivity. Because of the recent perspective of continued credit/behavior scoring, our study suggests using this credit score for non-traditional data sources for online loan providers to allow them to study and reveal changes in client behavior over time and choose the reliable unbanked customers, based on their application data. This is the first study that develops an online non-parametric credit scoring system, which can reselect effective features automatically for continued credit evaluation and weigh them out by their level of contribution with a good diagnostic ability. In the third study (Chapter Four) we focus on the financial stability challenges faced by insurance companies and pension schemes when managing systematic (undiversifiable) mortality and longevity risk. For this purpose, we first developed a new ensemble learning strategy for panel time-series forecasting and studied its applications to tracking respiratory disease excess mortality during the COVID-19 pandemic. The layered learning approach is a solution related to ensemble learning to address a given predictive task by different predictive models when direct mapping from inputs to outputs is not accurate. We adopt a layered learning approach to an ensemble learning strategy to solve the predictive tasks with improved predictive performance and take advantage of multiple learning processes into an ensemble model. In this proposed strategy, the appropriate holdout for each model is specified individually. Additionally, the models in the ensemble are selected by a proposed selection approach to be combined dynamically based on their predictive performance. It provides a high-performance ensemble model to automatically cope with the different kinds of time series for each panel member. For the experimental section, we studied more than twelve thousand observations in a portfolio of 61-time series (countries) of reported respiratory disease deaths with monthly sampling frequency to show the amount of improvement in predictive performance. We then compare each country’s forecasts of respiratory disease deaths generated by our model with the corresponding COVID-19 deaths in 2020. The results of this large set of experiments show that the accuracy of the ensemble model is improved noticeably by using different holdouts for different contributed time series methods based on the proposed model selection method. These improved time series models provide us proper forecasting of respiratory disease deaths for each country, exhibiting high correlation (0.94) with Covid-19 deaths in 2020. In the fourth study (Chapter Five) we used the new ensemble learning approach for time series modeling, discussed in the previous Chapter, accompany by K-means clustering for forecasting life tables in COVID-19 times. Stochastic mortality modeling plays a critical role in public pension design, population and public health projections, and in the design, pricing, and risk management of life insurance contracts and longevity-linked securities. There is no general method to forecast the mortality rate applicable to all situations especially for unusual years such as the COVID-19 pandemic. In this Chapter, we investigate the feasibility of using an ensemble of traditional and machine learning time series methods to empower forecasts of age-specific mortality rates for groups of countries that share common longevity trends. We use Generalized Age-Period-Cohort stochastic mortality models to capture age and period effects, apply K-means clustering to time series to group countries following common longevity trends, and use ensemble learning to forecast life expectancy and annuity prices by age and sex. To calibrate models, we use data for 14 European countries from 1960 to 2018. The results show that the ensemble method presents the best robust results overall with minimum RMSE in the presence of structural changes in the shape of time series at the time of COVID-19. In this dissertation’s conclusions (Chapter Six), we provide more detailed insights about the overall contributions of this dissertation on the financial stability and risk management by data science, opportunities, limitations, and avenues for future research about the application of data science in finance and economy
    corecore