103 research outputs found
A Pedagogical Approach with Integrating Online Platforms
Ashofteh, A. (2023). Teaching Note—Data Science Training for Finance and Risk Analysis: A Pedagogical Approach with Integrating Online Platforms. In C. P. Kitsos, T. A. Oliveira, F. Pierri, & M. Restaino (Eds.), Statistical Modelling and Risk Analysis: Selected contributions from ICRA9, Perugia, Italy, May 25-27, 2022 (Vol. 430, pp. 17-25). (Springer Proceedings in Mathematics & Statistics). Springer Nature. https://doi.org/10.1007/978-3-031-39864-3_2The main discussion of this paper is a method of data science training, which allows responding to the complex challenges of finance and risk analysis. There is growing recognition of the importance of creating and deploying financial models for risk management, incorporating new data and Big Data sources. Automating, analyzing, and optimizing a set of complex financial systems requires a wide range of skills and competencies that are rarely taught in typical finance and econometrics courses. Adopting these technologies for financial problems necessitates new skills and knowledge about processes, quality assurance frameworks, technologies, security needs, privacy, and legal issues. This paper discusses a pedagogical approach to overcome the teaching complexity of needed soft and hard skills in an integrated manner with its advantages, disadvantages, and vulnerabilities.preprintauthorsversionpublishe
Data Science for Finance: Targeted Learning from (Big) Data to Economic Stability and Financial Risk Management
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsThe modelling, measurement, and management of systemic financial stability remains a critical issue in most countries. Policymakers, regulators, and managers depend on complex models for financial stability and risk management. The models are compelled to be robust, realistic, and consistent with all relevant available data. This requires great data disclosure, which is deemed to have the highest quality standards. However, stressed situations, financial crises, and pandemics are the source of many new risks with new requirements such as new data sources and different models.
This dissertation aims to show the data quality challenges of high-risk situations such as pandemics or economic crisis and it try to theorize the new machine learning models for predictive and longitudes time series models.
In the first study (Chapter Two) we analyzed and compared the quality of official datasets available for COVID-19 as a best practice for a recent high-risk situation with dramatic effects on financial stability. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organizations based on the value of systematic measurement errors. We combined excel files, text mining techniques, and manual data entries to extract the COVID-19 data from official reports and to generate an accurate profile for comparisons. The findings show noticeable and increasing measurement errors in the three datasets as the pandemic outbreak expanded and more countries contributed data for the official repositories, raising data comparability concerns and pointing to the need for better coordination and harmonized statistical methods. The study offers a COVID-19 combined dataset and dashboard with minimum systematic measurement errors and valuable insights into the potential problems in using databanks without carefully examining the metadata and additional documentation that describe the overall context of data.
In the second study (Chapter Three) we discussed credit risk as the most significant source of risk in banking as one of the most important sectors of financial institutions. We proposed a new machine learning approach for online credit scoring which is enough conservative and robust for unstable and high-risk situations. This Chapter is aimed at the case of credit scoring in risk management and presents a novel method to be used for the default prediction of high-risk branches or customers. This study uses the Kruskal-Wallis non-parametric statistic to form a conservative credit-scoring model and to study its impact on modeling performance on the benefit of the credit provider. The findings show that the new credit scoring methodology represents a reasonable coefficient of determination and a very low false-negative rate. It is computationally less expensive with high accuracy with around 18% improvement in Recall/Sensitivity. Because of the recent perspective of continued credit/behavior scoring, our study suggests using this credit score for non-traditional data sources for online loan providers to allow them to study and reveal changes in client behavior over time and choose the reliable unbanked customers, based on their application data. This is the first study that develops an online non-parametric credit scoring system, which can reselect effective features automatically for continued credit evaluation and weigh them out by their level of contribution with a good diagnostic ability.
In the third study (Chapter Four) we focus on the financial stability challenges faced by insurance companies and pension schemes when managing systematic (undiversifiable) mortality and longevity risk. For this purpose, we first developed a new ensemble learning strategy for panel time-series forecasting and studied its applications to tracking respiratory disease excess mortality during the COVID-19 pandemic. The layered learning approach is a solution related to ensemble learning to address a given predictive task by different predictive models when direct mapping from inputs to outputs is not accurate. We adopt a layered learning approach to an ensemble learning strategy to solve the predictive tasks with improved predictive performance and take advantage of multiple learning processes into an ensemble model. In this proposed strategy, the appropriate holdout for each model is specified individually. Additionally, the models in the ensemble are selected by a proposed selection approach to be combined dynamically based on their predictive performance. It provides a high-performance ensemble model to automatically cope with the different kinds of time series for each panel member. For the experimental section, we studied more than twelve thousand observations in a portfolio of 61-time series (countries) of reported respiratory disease deaths with monthly sampling frequency to show the amount of improvement in predictive performance. We then compare each country’s forecasts of respiratory disease deaths generated by our model with the corresponding COVID-19 deaths in 2020. The results of this large set of experiments show that the accuracy of the ensemble model is improved noticeably by using different holdouts for different contributed time series methods based on the proposed model selection method. These improved time series models provide us proper forecasting of respiratory disease deaths for each country, exhibiting high correlation (0.94) with Covid-19 deaths in 2020. In the fourth study (Chapter Five) we used the new ensemble learning approach for time series modeling, discussed in the previous Chapter, accompany by K-means clustering for forecasting life tables in COVID-19 times. Stochastic mortality modeling plays a critical role in public pension design, population and public health projections, and in the design, pricing, and risk management of life insurance contracts and longevity-linked securities. There is no general method to forecast the mortality rate applicable to all situations especially for unusual years such as the COVID-19 pandemic. In this Chapter, we investigate the feasibility of using an ensemble of traditional and machine learning time series methods to empower forecasts of age-specific mortality rates for groups of countries that share common longevity trends. We use Generalized Age-Period-Cohort stochastic mortality models to capture age and period effects, apply K-means clustering to time series to group countries following common longevity trends, and use ensemble learning to forecast life expectancy and annuity prices by age and sex. To calibrate models, we use data for 14 European countries from 1960 to 2018. The results show that the ensemble method presents the best robust results overall with minimum RMSE in the presence of structural changes in the shape of time series at the time of COVID-19.
In this dissertation’s conclusions (Chapter Six), we provide more detailed insights about the overall contributions of this dissertation on the financial stability and risk management by data science, opportunities, limitations, and avenues for future research about the application of data science in finance and economy
Determining Source Rock and its Characteristics Using Organic Geo-Chemistry Derived from Parent Rock Evaluation, Separation, and Columnar and Gaseous Chromatography on Cretaceous Units in Central Iran at Khor-Biyabanak
Recently, the investigations made on rocks of cretaceous units at Central Iran have revealed source rocks and their characteristics (origin, amount, and type, maturation of organic material, hydrocarbure (Hydrocarbon) generation ability and sedimentary organic material environment). The information obtained from investigation and evaluation of cretaceous rock units at Khor-Biyabanak region using following methods have led to the determination of the source rocks with some weak to medium organic material from the downstream to upstream of the section (underlying-intermediate cretaceous towards upstream cretaceous) in the region such as evaluation of source rock, separation of Bitumen, columnar Chromatography, and aseous Chromatography. With due regard to the amounts of TOC, HI, Tmax, S1, S2, PI, EOM, HCS, SAT/ARO, CPI, Pr/Ph, pr/n-c17, ph/n-c18, and the ratios, graphs, and derived peak values, it can be claimed that the type of organic materials present in the source rocks are Classes II and III Kerogens. The maturation of existing organic material in the source rocks is high and shows approximately the ending oil generation window and last stages of Catagenesis (Katagenesis). The origin of the organic material is also a marine and continental mixture which currently is capable to generate humid and dry gas in the existing source rocks. Also, the formation environment of existing source rocks is regenerative, and quasi-regenerative-marine
Determining Source Rock and its Characteristics Using Organic Geo-Chemistry Derived from Parent Rock Evaluation, Separation, and Columnar and Gaseous Chromatography on Cretaceous Units in Central Iran at Khor-Biyabanak
Recently, the investigations made on rocks of cretaceous units at Central Iran have revealed source rocks and their characteristics (origin, amount, and type, maturation of organic material, hydrocarbure (Hydrocarbon) generation ability and sedimentary organic material environment). The information obtained from investigation and evaluation of cretaceous rock units at Khor-Biyabanak region using following methods have led to the determination of the source rocks with some weak to medium organic material from the downstream to upstream of the section (underlying-intermediate cretaceous towards upstream cretaceous) in the region such as evaluation of source rock, separation of Bitumen, columnar Chromatography, and aseous Chromatography. With due regard to the amounts of TOC, HI, Tmax, S1, S2, PI, EOM, HCS, SAT/ARO, CPI, Pr/Ph, pr/n-c17, ph/n-c18, and the ratios, graphs, and derived peak values, it can be claimed that the type of organic materials present in the source rocks are Classes II and III Kerogens. The maturation of existing organic material in the source rocks is high and shows approximately the ending oil generation window and last stages of Catagenesis (Katagenesis). The origin of the organic material is also a marine and continental mixture which currently is capable to generate humid and dry gas in the existing source rocks. Also, the formation environment of existing source rocks is regenerative, and quasi-regenerative-marine
Ensemble Methods for Consumer Price Inflation Forecasting
Bravo, J. M., & Ashofteh, A. (2023). Ensemble Methods for Consumer Price Inflation Forecasting. In CAPSI 2023 Proceedings (pp. 317-336). Article 25 (Atas da Conferência da Associação Portuguesa de Sistemas de Informação). Associação Portuguesa de Sistemas de Informação. https://doi.org/10.18803/capsi.v23.317-336 --- This research was funded by national funds through the FCT—Fundação para a Ciência e a Tecnologia, I.P., grants UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC) and UIDB/00315/2020—BRU-ISCTE-IUL. The authors express their gratitude to two anonymous referees for their careful review and insightful comments that helped to strengthen the quality of the paper.Inflation forecasting is one of the central issues in micro and macroeconomics. Standard forecasting methods tend to follow a "winner-take-all" approach by which, for each time series, a single believed to be the best method is chosen from a pool of competing models. This paper investigates the predictive accuracy of a metalearning strategy called Arbitrated Dynamic Ensemble (ADE) in inflation forecasting using United States data. The findings show that: i) the SARIMA model exhibits the best average rank relative to ADE and competing state-of-theart model combination and metalearning methods; ii) the ADE methodology presents a better average rank compared to widely used model combination approaches, including the original Arbitrating approach, Stacking, Simple averaging, Fixed Share, or weighted adaptive combination of experts; iii) the ADE approach benefits from combining the base-learners as opposed to selecting the best forecasting model or using all experts; iv) the method is sensitive to the aggregation (weighting) mechanism.publishersversionpublishe
A non-parametric-based computationally efficient approach for credit scoring
Ashofteh, A., & Bravo, J. M. (2019). A non-parametric-based computationally efficient approach for credit scoring. In Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao 2019: 19ª Conferencia da Associacao Portuguesa de Sistemas de Informacao, CAPSI 2019 - 19th Conference of the Portuguese Association for Information Systems, CAPSI 2019; Lisboa; Portugal; 11 October 2019 through 12 October 2019 (pp. 19). (Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao).This research aimed at the case of credit scoring in risk management and presented the novel method for credit scoring to be used for default prediction. This study uses Kruskal-Wallis non-parametric statistic to form a computationally efficient credit-scoring model based on artificial neural network to study the impact on modelling performance. The findings show that new credit scoring methodology represents reasonable coefficient of determination and low false negative rate. It is computationally less expensive with high accuracy (AUC=0.99). Because of the recent respective of continued credit/behavior scoring, our study suggests to use this credit score for non-traditional data sources such as mobile phone data to study and reveal changes of client’s behavior during the time. This is the first study that develops a non-parametric credit scoring, which is able to reselect effective features for continued credit evaluation and weighted out by their level of contribution with a good diagnostic ability.publishersversionpublishe
a New Scientific Paradigm of Information and Knowledge Development in National Statistical Systems
Ashofteh, A., & Bravo, J. M. (2021). Data Science Training for Official Statistics: a New Scientific Paradigm of Information and Knowledge Development in National Statistical Systems. Statistical Journal of the IAOS, 37(3), 771 – 789. https://doi.org/10.3233/SJI-210841The ability to incorporate new and Big Data sources and to benefit from emerging technologies such as Web Technologies, Remote Data Collection methods, User Experience Platforms, and Trusted Smart Statistics will become increasingly important in producing and disseminating official statistics. The skills and competencies required to automate, analyse, and optimize such complex systems are often not part of the traditional skill set of most National Statistical Offices. The adoption of these technologies requires new knowledge, methodologies and the upgrading of the quality assurance framework, technology, security, privacy, and legal matters. However, there are methodological challenges and discussions among scholars about the diverse methodical confinement and the wide array of skills and competencies considered relevant for those working with big data at NSOs. This paper develops a Data Science Model for Official Statistics (DSMOS), graphically summarizing the role of data science in statistical business processes. The model combines data science, existing scientific paradigms, and trusted smart statistics, and develops around a restricted number of constructs. We considered a combination of statistical engineering, data engineering, data analysis, software engineering and soft skills such as statistical thinking, statistical literacy and specific knowledge of official statistics and dissemination of official statistics products as key requirements of data science in official statistics. We then analyse and discuss the educational requirements of the proposed model, clarifying their contribution, interactions, and current and future importance in official statistics. The DSMOS was validated through a quantitative method, using a survey addressed to experts working at the European statistical systems. The empirical results show that the core competencies considered relevant for the DSMOS include acquisition and processing capabilities related to Statistics, high-frequency data, spatial data, Big Data, and microdata/nano-data, in addition to problem-solving skills, Spatio-temporal modelling, machine learning, programming with R and SAS software, Data visualisation using novel technologies, Data and statistical literacy, Ethics in Official Statistics, New data methodologies, New data quality tools, standards and frameworks for official statistics. Some disadvantages and vulnerabilities are also addressed in the paper.publishersversionpublishe
A Systematic Review on Robot-Advisors in Fintech
Martins, M. N., & Ashofteh, A. (Accepted/In press). A Systematic Review on Robot-Advisors in Fintech. Paper presented at 23.ª Conferência da Associação Portuguesa de Sistemas de Informação, Beja, Portugal.Technology has been the main driver for the financial sector. Fintech tools emerged to support the provision of financial services, especially Robot-Advisors (RAs), which allow the automation of the investment management process. The main functions are the creation of an investment portfolio and allocating assets, and daily management of investment portfolios based on a machine learning algorithm. This paper presents a literature review to summarise the importance of RAs in the financial sectors as well as the perception of investors. Also, this literature review presents the main algorithm’s characteristics behind the intelligence of RAs and the primary concerns. The Scopus and Web of Science databases revealed 114 research papers. It was found that investor acceptance of these technologies is affected by aspects of high volatility, which includes financial markets. The algorithm's mathematical models and system architecture might be improved so that this instrument can better suit the needs of investors.authorsversioninpres
Investigation on Sedimentary and Depositional Environment Usage of Cretaceous in South-East of Golpayegan Region
In this investigation, the facies and their usage in the environment of cretaceous rocks that crop out at 35 km south-east of Golpayegan is studied. The section with 811 m thickness is composed of compact, dark marly shale, shale dark sandstones, orbitolina-contained limestone, marly limestone and calcareous limestone of early cretaceous (Albian), and dark gray limestone marly limestone of late cretaceous ages. Field and microscopic (petrographic) studies lead to recognition of 7 carbonate and 2 clastic facies. Facies 1 (Bioclast lime mudstone) indicates medium open marine environments. Facies 2 (Bioclast wackestone). Facies 3 (Peloid wackestone), Facies 4 (Bioclast packstone), Facies 5 (Bioclast packstone), Facies 6 (Intraclast packstone) and Facies 7 (Lime mudstone) indicate shallow to medium (sub-tidal) inter-tidal and supra-tidal deposition. In some cases, such as Facies 8 (Lime mudstone) and Facies 9 (Sandstone) Facies refer to a clastic condition which has direct relationship with active tectonic periods. Meanwhile in normal cases, Facies 4, 5 and 6 Facies are representative of carbonate basin in shape of homoclinal ramp
A study on the quality of novel coronavirus (COVID-19) official datasets
Ashofteh, A., & Bravo, J. M. (2020). A study on the quality of novel coronavirus (COVID-19) official datasets. Statistical Journal of the IAOS, 36(2), 291-301. https://doi.org/10.3233/SJI-200674Policy makers depend on complex epidemiological models that are compelled to be robust, realistic, defendable and consistent with all relevant available data disclosed by official authorities which is deemed to have the highest quality standards. This paper analyses and compares the quality of official datasets available for COVID-19. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organisations based on the value of systematic measurement errors. We combined excel files, text mining techniques and manual data entries to extract the COVID-19 data from official reports and to generate an accurate profile for comparisons. The findings show noticeable and increasing measurement errors in the three datasets as the pandemic outbreak expanded and more countries contributed data for the official repositories, raising data comparability concerns and pointing to the need for better coordination and harmonized statistical methods. The study offers a COVID-19 combined dataset and dashboard with minimum systematic measurement errors, and valuable insights into the potential problems in using databanks without carefully examining the metadata and additional documentation that describe the overall context of data.publishersversionpublishe
- …