1,992 research outputs found

    Data Science for Finance: Targeted Learning from (Big) Data to Economic Stability and Financial Risk Management

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsThe modelling, measurement, and management of systemic financial stability remains a critical issue in most countries. Policymakers, regulators, and managers depend on complex models for financial stability and risk management. The models are compelled to be robust, realistic, and consistent with all relevant available data. This requires great data disclosure, which is deemed to have the highest quality standards. However, stressed situations, financial crises, and pandemics are the source of many new risks with new requirements such as new data sources and different models. This dissertation aims to show the data quality challenges of high-risk situations such as pandemics or economic crisis and it try to theorize the new machine learning models for predictive and longitudes time series models. In the first study (Chapter Two) we analyzed and compared the quality of official datasets available for COVID-19 as a best practice for a recent high-risk situation with dramatic effects on financial stability. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organizations based on the value of systematic measurement errors. We combined excel files, text mining techniques, and manual data entries to extract the COVID-19 data from official reports and to generate an accurate profile for comparisons. The findings show noticeable and increasing measurement errors in the three datasets as the pandemic outbreak expanded and more countries contributed data for the official repositories, raising data comparability concerns and pointing to the need for better coordination and harmonized statistical methods. The study offers a COVID-19 combined dataset and dashboard with minimum systematic measurement errors and valuable insights into the potential problems in using databanks without carefully examining the metadata and additional documentation that describe the overall context of data. In the second study (Chapter Three) we discussed credit risk as the most significant source of risk in banking as one of the most important sectors of financial institutions. We proposed a new machine learning approach for online credit scoring which is enough conservative and robust for unstable and high-risk situations. This Chapter is aimed at the case of credit scoring in risk management and presents a novel method to be used for the default prediction of high-risk branches or customers. This study uses the Kruskal-Wallis non-parametric statistic to form a conservative credit-scoring model and to study its impact on modeling performance on the benefit of the credit provider. The findings show that the new credit scoring methodology represents a reasonable coefficient of determination and a very low false-negative rate. It is computationally less expensive with high accuracy with around 18% improvement in Recall/Sensitivity. Because of the recent perspective of continued credit/behavior scoring, our study suggests using this credit score for non-traditional data sources for online loan providers to allow them to study and reveal changes in client behavior over time and choose the reliable unbanked customers, based on their application data. This is the first study that develops an online non-parametric credit scoring system, which can reselect effective features automatically for continued credit evaluation and weigh them out by their level of contribution with a good diagnostic ability. In the third study (Chapter Four) we focus on the financial stability challenges faced by insurance companies and pension schemes when managing systematic (undiversifiable) mortality and longevity risk. For this purpose, we first developed a new ensemble learning strategy for panel time-series forecasting and studied its applications to tracking respiratory disease excess mortality during the COVID-19 pandemic. The layered learning approach is a solution related to ensemble learning to address a given predictive task by different predictive models when direct mapping from inputs to outputs is not accurate. We adopt a layered learning approach to an ensemble learning strategy to solve the predictive tasks with improved predictive performance and take advantage of multiple learning processes into an ensemble model. In this proposed strategy, the appropriate holdout for each model is specified individually. Additionally, the models in the ensemble are selected by a proposed selection approach to be combined dynamically based on their predictive performance. It provides a high-performance ensemble model to automatically cope with the different kinds of time series for each panel member. For the experimental section, we studied more than twelve thousand observations in a portfolio of 61-time series (countries) of reported respiratory disease deaths with monthly sampling frequency to show the amount of improvement in predictive performance. We then compare each country’s forecasts of respiratory disease deaths generated by our model with the corresponding COVID-19 deaths in 2020. The results of this large set of experiments show that the accuracy of the ensemble model is improved noticeably by using different holdouts for different contributed time series methods based on the proposed model selection method. These improved time series models provide us proper forecasting of respiratory disease deaths for each country, exhibiting high correlation (0.94) with Covid-19 deaths in 2020. In the fourth study (Chapter Five) we used the new ensemble learning approach for time series modeling, discussed in the previous Chapter, accompany by K-means clustering for forecasting life tables in COVID-19 times. Stochastic mortality modeling plays a critical role in public pension design, population and public health projections, and in the design, pricing, and risk management of life insurance contracts and longevity-linked securities. There is no general method to forecast the mortality rate applicable to all situations especially for unusual years such as the COVID-19 pandemic. In this Chapter, we investigate the feasibility of using an ensemble of traditional and machine learning time series methods to empower forecasts of age-specific mortality rates for groups of countries that share common longevity trends. We use Generalized Age-Period-Cohort stochastic mortality models to capture age and period effects, apply K-means clustering to time series to group countries following common longevity trends, and use ensemble learning to forecast life expectancy and annuity prices by age and sex. To calibrate models, we use data for 14 European countries from 1960 to 2018. The results show that the ensemble method presents the best robust results overall with minimum RMSE in the presence of structural changes in the shape of time series at the time of COVID-19. In this dissertation’s conclusions (Chapter Six), we provide more detailed insights about the overall contributions of this dissertation on the financial stability and risk management by data science, opportunities, limitations, and avenues for future research about the application of data science in finance and economy

    Risk prediction of product-harm events using rough sets and multiple classifier fusion:an experimental study of listed companies in China

    Get PDF
    With the increasing of frequency and destructiveness of product-harm events, study on enterprise crisis management becomes essentially important, but little literature thoroughly explores the risk prediction method of product-harm event. In this study, an initial index system for risk prediction was built based on the analysis of the key drivers of the product-harm event's evolution; ultimately, nine risk-forecasting indexes were obtained using rough set attribute reduction. With the four indexes of cumulative abnormal returns as the input, fuzzy clustering was used to classify the risk level of a product-harm event into four grades. In order to control the uncertainty and instability of single classifiers in risk prediction, multiple classifier fusion was introduced and combined with self-organising data mining (SODM). Further, an SODM-based multiple classifier fusion (SB-MCF) model was presented for the risk prediction related to a product-harm event. The experimental results based on 165 Chinese listed companies indicated that the SB-MCF model improved the average predictive accuracy and reduced variation degree simultaneously. The statistical analysis demonstrated that the SB-MCF model significantly outperformed six widely used single classification models (e.g. neural networks, support vector machine, and case-based reasoning) and other six commonly used multiple classifier fusion methods (e.g. majority voting, Bayesian method, and genetic algorithm)

    The Application of Artificial Intelligence in Project Management Research: A Review

    Get PDF
    The field of artificial intelligence is currently experiencing relentless growth, with innumerable models emerging in the research and development phases across various fields, including science, finance, and engineering. In this work, the authors review a large number of learning techniques aimed at project management. The analysis is largely focused on hybrid systems, which present computational models of blended learning techniques. At present, these models are at a very early stage and major efforts in terms of development is required within the scientific community. In addition, we provide a classification of all the areas within project management and the learning techniques that are used in each, presenting a brief study of the different artificial intelligence techniques used today and the areas of project management in which agents are being applied. This work should serve as a starting point for researchers who wish to work in the exciting world of artificial intelligence in relation to project leadership and management

    Optimized hybrid ensemble learning approaches applied to very short-term load forecasting

    Get PDF
    The significance of accurate short-term load forecasting (STLF) for modern power systems’ efficient and secure operation is paramount. This task is intricate due to cyclicity, non-stationarity, seasonality, and nonlinear power consumption time series data characteristics. The rise of data accessibility in the power industry has paved the way for machine learning (ML) models, which show the potential to enhance STLF accuracy. This paper presents a novel hybrid ML model combining Gradient Boosting Regressor (GBR), Extreme Gradient Boosting (XGBoost), k-Nearest Neighbors (kNN), and Support Vector Regression (SVR), examining both standalone and integrated, coupled with signal decomposition techniques like STL, EMD, EEMD, CEEMDAN, and EWT. Through Automated Machine Learning (AutoML), these models are integrated and their hyperparameters optimized, predicting each load signal component using data from two sources: The National Operator of Electric System (ONS) and the Independent System Operators New England (ISO-NE), boosting prediction capacity. For the 2019 ONS dataset, combining EWT and XGBoost yielded the best results for very short-term load forecasting (VSTLF) with an RMSE of 1,931.8 MW, MAE of 1,564.9 MW, and MAPE of 2.54%. These findings highlight the necessity for diverse approaches to each VSTLF problem, emphasizing the adaptability and strength of ML models combined with signal decomposition techniques

    Comparative analysis of the frequentist and Bayesian approaches to stress testing

    Get PDF
    Stress testing is necessary for banks as it is required by the Basel Accords for loss predictions and regulatory and economic capital computations. It has become increasingly important especially after the 2008 global financial crisis. Credit models are essential in controlling credit risk. The search for new ways to more accurately predict credit risk continues. This thesis concentrates on stress testing the probability of default using the Bayesian posterior distribution to incorporate estimation uncertainty and parameter instability. It also explores modelling the probability of default using Bayesian informative priors to enhance the model predictive accuracy. A new Bayesian informative prior selection method is proposed to include additional information to credit risk modelling and improve model performances. We employ cross-sectional logistic regressions to model the probability of default of mortgage loans using both the Bayesian approach with various priors and the frequentist approach. In the Bayesian informative prior selection method that we propose, we treat coefficients in the PD model as time series variables. We build ARIMA models to forecast the coefficient values in future time periods and use these ARIMA forecasts as Bayesian informative priors. We find that the Bayesian models using this prior selection method outperform both frequentist models and Bayesian models with other priors in terms of model predictive accuracy. We propose a new stress testing method to model both macroeconomic stress and coefficient uncertainty. Based on U.S. mortgage loan data, we model the probability of default at the account level using discrete time hazard analysis. We employ both the frequentist and Bayesian methods in parameter estimation and default rate (DR) stress testing. By applying the parameter posterior distribution obtained in the Bayesian approach to simulating the Bayesian estimated DR distribution, we reduce the estimation risk coming from employing point estimates in stress testing. We find that the 99% value at risk (VaR) using the Bayesian posterior distribution approach is around 6.5 times the VaR at the same probability level using the frequentist approach with parameter mean estimates. We furthersimulate DR distributions based on models built on crisis and tranquil time periods to explore the impact changes in model parameters between different scenarios have on stress testing results. We apply the parameter posterior distribution obtained in a Bayesian approach to stress testing to reduce the estimation risk that results from using parameter point estimates. We compute the VaRs and required capital with both parameter instability between scenarios and with estimation risk considered. The results are compared with those obtained when coefficient changes in stress testing models or coefficient uncertainty are neglected. We find that the required capital is considerably underestimated when neither parameter instability nor estimation risk is addressed

    Corporate Bankruptcy Prediction

    Get PDF
    Bankruptcy prediction is one of the most important research areas in corporate finance. Bankruptcies are an indispensable element of the functioning of the market economy, and at the same time generate significant losses for stakeholders. Hence, this book was established to collect the results of research on the latest trends in predicting the bankruptcy of enterprises. It suggests models developed for different countries using both traditional and more advanced methods. Problems connected with predicting bankruptcy during periods of prosperity and recession, the selection of appropriate explanatory variables, as well as the dynamization of models are presented. The reliability of financial data and the validity of the audit are also referenced. Thus, I hope that this book will inspire you to undertake new research in the field of forecasting the risk of bankruptcy

    Flood Forecasting Using Machine Learning Methods

    Get PDF
    This book is a printed edition of the Special Issue Flood Forecasting Using Machine Learning Methods that was published in Wate

    Explainable adaptation of time series forecasting

    Get PDF
    A time series is a collection of data points captured over time, commonly found in many fields such as healthcare, manufacturing, and transportation. Accurately predicting the future behavior of a time series is crucial for decision-making, and several Machine Learning (ML) models have been applied to solve this task. However, changes in the time series, known as concept drift, can affect model generalization to future data, requiring thus online adaptive forecasting methods. This thesis aims to extend the State-of-the-Art (SoA) in the ML literature for time series forecasting by developing novel online adaptive methods. The first part focuses on online time series forecasting, including a framework for selecting time series variables and developing ensemble models that are adaptive to changes in time series data and model performance. Empirical results show the usefulness and competitiveness of the developed methods and their contribution to the explainability of both model selection and ensemble pruning processes. Regarding the second part, the thesis contributes to the literature on online ML model-based quality prediction for three Industry 4.0 applications: NC-milling, bolt installation in the automotive industry, and Surface Mount Technology (SMT) in electronics manufacturing. The thesis shows how process simulation can be used to generate additional knowledge and how such knowledge can be integrated efficiently into the ML process. The thesis also presents two applications of explainable model-based quality prediction and their impact on smart industry practices

    Novel Computationally Intelligent Machine Learning Algorithms for Data Mining and Knowledge Discovery

    Get PDF
    This thesis addresses three major issues in data mining regarding feature subset selection in large dimensionality domains, plausible reconstruction of incomplete data in cross-sectional applications, and forecasting univariate time series. For the automated selection of an optimal subset of features in real time, we present an improved hybrid algorithm: SAGA. SAGA combines the ability to avoid being trapped in local minima of Simulated Annealing with the very high convergence rate of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks (GRNN). For imputing missing values and forecasting univariate time series, we propose a homogeneous neural network ensemble. The proposed ensemble consists of a committee of Generalized Regression Neural Networks (GRNNs) trained on different subsets of features generated by SAGA and the predictions of base classifiers are combined by a fusion rule. This approach makes it possible to discover all important interrelations between the values of the target variable and the input features. The proposed ensemble scheme has two innovative features which make it stand out amongst ensemble learning algorithms: (1) the ensemble makeup is optimized automatically by SAGA; and (2) GRNN is used for both base classifiers and the top level combiner classifier. Because of GRNN, the proposed ensemble is a dynamic weighting scheme. This is in contrast to the existing ensemble approaches which belong to the simple voting and static weighting strategy. The basic idea of the dynamic weighting procedure is to give a higher reliability weight to those scenarios that are similar to the new ones. The simulation results demonstrate the validity of the proposed ensemble model
    • …
    corecore