4 research outputs found

    Statistical Model Selection and Prediction for Non-standard Data: Insights and Applications in Economics and Finance

    Get PDF
    In an increasingly digital world, data has become abundant and research about leveraging such vast amounts of data is on the rise. While extracting important information relevant for economic policies or financial risk is crucial, the often non-standard structure of such observational data poses many challenges for researchers. That includes highly correlated, time-dependent data, combinations of unstructured data, and even high-dimensional situations, where we have very few data points and many potentially relevant factors. In this thesis, I tackle the above challenges by developing interpretable statistical machine learning methods to reveal important effects of public policies, to better assess risks in financial applications, and to quantify market drivers. I study causal inference, statistical model selection, and prediction in different social and economic contexts in order to uncover statistical relationships and to identify important contributing factors. In the first part of my work, I analyze financial risk with cryptocurrencies and corporate bonds. For the former, I identify classes of assets and time periods where flexible machine learning methods, such as random forests employed within a statistical framework, significantly improve predictability of risk. This is vital given the highly volatile return structure of cryptocurrencies. For corporate bonds, I uncover drivers of the risk of default by developing a method that correctly handles the underlying, highly correlated, time series data. In the second part, I focus on the evaluation of the causal effect of tuition fees on university student enrollment. I develop methods to deal with the many possible influencing factors given only few observations by combining subsampling-based methods with regularization in a panel setup. I can show that there was a causal effect of the short tuition fee period in Germany by disentangling this effect from other factors and policies. In the third part, I combine satellite images with many noisy, observational data sources to show the impact of crime on the housing market of New York City on a spatial grid. To overcome the endogeneity of crime for house prices, I develop a method that leverages satellite data, can be easily extended to other cities, and highlights the non-linearity of crime on a spatial level

    National and subnational short-term forecasting of COVID-19 in Germany and Poland during early 2021

    Get PDF
    We compare forecasts of weekly case and death numbers for COVID-19 in Germany and Poland based on 15 different modelling approaches. These cover the period from January to April 2021 and address numbers of cases and deaths one and two weeks into the future, along with the respective uncertainties. We find that combining different forecasts into one forecast can enable better predictions. However, case numbers over longer periods were challenging to predict. Additional data sources, such as information about different versions of the SARS-CoV-2 virus present in the population, might improve forecasts in the future

    National and subnational short-term forecasting of COVID-19 in Germany and Poland during early 2021

    No full text
    We compare forecasts of weekly case and death numbers for COVID-19 in Germany and Poland based on 15 different modelling approaches. These cover the period from January to April 2021 and address numbers of cases and deaths one and two weeks into the future, along with the respective uncertainties. We find that combining different forecasts into one forecast can enable better predictions. However, case numbers over longer periods were challenging to predict. Additional data sources, such as information about different versions of the SARS-CoV-2 virus present in the population, might improve forecasts in the future
    corecore