4,942 research outputs found
Statistical approaches for modeling network and public health data
The rapid technological advancement that characterized the past few decades has brought about an increasingly large amount and variety of data. This wealth of data naturally comes with further complexity, thus requiring increasingly sophisticated and efficient methodologies to extract valuable information from it. In this context, statistical models can serve as effective tools to obtain interpretable insight from the data while adequately quantifying and accounting for the underlying uncertainty. This thesis deals with the statistical modeling of two broad data categories that are prominent in modern times: network data and public health data. After an introductory Part I, the thesis comprises a total of eleven contributions, which can be divided into three further parts.
Part II, composed of four contributions, deals with the statistical analysis of network data. Networks can broadly be defined as groups of interconnected people or things. This thesis focuses mostly on social and economic networks, and on statistical models aimed at capturing and ex- plaining the mechanisms leading to the formation of ties between actors within the network. We specifically concern ourselves with two broad model families, namely latent variable models and exponential random graph models. The first two contributions in this section introduce and compare several models from these classes, and showcase them by applying them to real-world network data. The following two contributions extend and apply these models to answer substantive questions in the social sciences. More specifically, the third contribution extends exponential random graph models to deal with the modeling of a massive dynamic bipartite network of patents and inventors to explore the drivers of innovation, while the fourth one uses latent distance models to map the network of popular Twitter users discussing the COVID-19 pandemic, with the goal of investigating polarization on the platform.
Part III, which also comprises four contributions, addresses statistical challenges related to the real-time monitoring and modeling of public health data. More specifically, the chapter tackles questions that emerged during the early stages of the COVID-19 pandemic, mainly by adapting and extending the class of generalized additive mixed models (GAMMs). The fifth contribution develops a statistical model using reported fatal infections data to predict how many of the registered infections will turn out to be lethal in the near future, thereby enabling to effectively monitor the current state of the pandemic. The sixth contribution instead focuses on all reported infections, and proposes a model to nowcast locally detected (but not yet centrally reported) cases by accounting for expected reporting delays, as well as to forecast infections at the regional level in the near future. The seventh contribution proposes a statistical tool to study the dynamics of the case-detection ratio over time, allowing for comparisons of infection figures between different pandemic phases. The chapter is concluded by the eighth contribution, which further demonstrates the effectiveness of GAMMs by applying them to three relevant pandemic-related issues, i.e. the interdependence among infections in different age groups among school children, the nowcasting of COVID-19 related hospitalizations, and the modeling of the weekly occupancy of intensive care units.
Finally, Part IV, composed of three contributions, focuses on the principled estimation of excess mortality, which can generally be defined as the number of deaths from all causes during a crisis beyond what would have been expected had the crisis not occurred. More specifically, the ninth contribution develops a point-estimation method by deploying a corrected version of classical life tables to calculate age-adjusted excess mortality, and applies it to obtain estimates the first year of the COVID-19 pandemic (i.e. 2020) in Germany. The tenth contribution applies the same method to provide updated age-specific estimates for 2021. Finally, the eleventh contribution extends the method to incorporate uncertainty quantification, and deploys it at a broader scale to obtain estimates for 30 developed countries in the first two years of the COVID-19 crisis. The results are further compared with existing estimates published in other major scientific outlets, highlighting the importance of proper age adjustment to obtain unbiased figures
Estimating excess mortality in high-income countries during the COVID-19 pandemic
Quantifying the number of deaths caused by the COVID-19 crisis has been an
ongoing challenge for scientists, and no golden standard to do so has yet been
established. We propose a robust approach to calculate age-adjusted yearly
excess mortality, and apply it to obtain estimates and uncertainty bounds for
28 countries with publicly available data. The results uncover remarkable
variation in pandemic outcomes across different countries. We further compare
our findings with existing estimates published in other major scientific
outlets, highlighting the importance of proper age adjustment to obtain
unbiased figures
An update on excess mortality in the second year of the COVID-19 pandemic in Germany
In this short note, we apply the method of De Nicola et al. (2022) to the most recent available data, thereby providing up-to-date estimates of all-cause excess mortality in Germany for 2021. The analysis reveals a preliminary excess mortality of approximately 2.3% for the calendar year considered. The excess is mainly driven by significantly higher excess mortality in the 60-79 age group
On assessing excess mortality in Germany during the COVID-19 pandemic
Coronavirus disease 2019 (COVID-19) is associated with a very high number of casualties in the general population. Assessing the exact magnitude of this number is a non-trivial problem, as relying only on officially reported COVID-19 associated fatalities runs the risk of incurring in several kinds of biases. One of the ways to approach the issue is to compare overall mortality during the pandemic with expected mortality computed using the observed mortality figures of previous years. In this paper, we build on existing methodology and propose two ways to compute expected as well as excess mortality, namely at the weekly and at the yearly level. Particular focus is put on the role of age, which plays a central part in both COVID-19-associated and overall mortality. We illustrate our methods by making use of age-stratified mortality data from the years 2016 to 2020 in Germany to compute age group-specific excess mortality during the COVID-19 pandemic in 2020
Nowcasting fatal COVID-19 infections on a regional level in Germany
We analyse the temporal and regional structure in mortality rates related to COVID‐19 infections, making use of the openly available data on registered cases in Germany published by the Robert Koch Institute on a daily basis. Estimates for the number of present‐day infections that will, at a later date, prove to be fatal are derived through a nowcasting model, which relates the day of death of each deceased patient to the corresponding day of registration of the infection. Our district‐level modelling approach for fatal infections disentangles spatial variation into a global pattern for Germany, district‐specific long‐term effects and short‐term dynamics, while also taking the age and gender structure of the regional population into account. This enables to highlight areas with unexpectedly high disease activity. The analysis of death counts contributes to a better understanding of the spread of the disease while being, to some extent, less dependent on testing strategy and capacity in comparison to infection counts. The proposed approach and the presented results thus provide reliable insight into the state and the dynamics of the pandemic during the early phases of the infection wave in spring 2020 in Germany, when little was known about the disease and limited data were available
Dependence matters: Statistical models to identify the drivers of tie formation in economic networks
Networks are ubiquitous in economic research on organizations, trade, and
many other areas. However, while economic theory extensively considers
networks, no general framework for their empirical modeling has yet emerged. We
thus introduce two different statistical models for this purpose -- the
Exponential Random Graph Model (ERGM) and the Additive and Multiplicative
Effects network model (AME). Both model classes can account for network
interdependencies between observations, but differ in how they do so. The ERGM
allows one to explicitly specify and test the influence of particular network
structures, making it a natural choice if one is substantively interested in
estimating endogenous network effects. In contrast, AME captures these effects
by introducing actor-specific latent variables affecting their propensity to
form ties. This makes the latter a good choice if the researcher is interested
in capturing the effect of exogenous covariates on tie formation without having
a specific theory on the endogenous dependence structures at play. After
introducing the two model classes, we showcase them through real-world
applications to networks stemming from international arms trade and foreign
exchange activity. We further provide full replication materials to facilitate
the adoption of these methods in empirical economic research
- …