1,278 research outputs found
Equations defining probability tree models
Coloured probability tree models are statistical models coding conditional
independence between events depicted in a tree graph. They are more general
than the very important class of context-specific Bayesian networks. In this
paper, we study the algebraic properties of their ideal of model invariants.
The generators of this ideal can be easily read from the tree graph and have a
straightforward interpretation in terms of the underlying model: they are
differences of odds ratios coming from conditional probabilities. One of the
key findings in this analysis is that the tree is a convenient tool for
understanding the exact algebraic way in which the sum-to-1 conditions on the
parameter space translate into the sum-to-one conditions on the joint
probabilities of the statistical model. This enables us to identify necessary
and sufficient graphical conditions for a staged tree model to be a toric
variety intersected with a probability simplex.Comment: 22 pages, 4 figure
Equivalence Classes of Staged Trees
In this paper we give a complete characterization of the statistical
equivalence classes of CEGs and of staged trees. We are able to show that all
graphical representations of the same model share a common polynomial
description. Then, simple transformations on that polynomial enable us to
traverse the corresponding class of graphs. We illustrate our results with a
real analysis of the implicit dependence relationships within a previously
studied dataset.Comment: 18 pages, 4 figure
Sensitivity analysis in multilinear probabilistic models
Sensitivity methods for the analysis of the outputs of discrete Bayesian networks have been extensively studied and implemented in different software packages. These methods usually focus on the study of sensitivity functions and on the impact of a parameter change to the Chan–Darwiche distance. Although not fully recognized, the majority of these results rely heavily on the multilinear structure of atomic probabilities in terms of the conditional probability parameters associated with this type of network. By defining a statistical model through the polynomial expression of its associated defining conditional probabilities, we develop here a unifying approach to sensitivity methods applicable to a large suite of models including extensions of Bayesian networks, for instance context-specific ones. Our algebraic approach enables us to prove that for models whose defining polynomial is multilinear both the Chan–Darwiche distance and any divergence in the family of ϕ-divergences are minimized for a certain class of multi-parameter contemporaneous variations when parameters are proportionally covaried
Statistical Model Selection and Prediction for Non-standard Data: Insights and Applications in Economics and Finance
In an increasingly digital world, data has become abundant and research about leveraging such vast amounts of data is on the rise. While extracting important information relevant
for economic policies or financial risk is crucial, the often non-standard structure of such observational data poses many challenges for researchers. That includes highly correlated,
time-dependent data, combinations of unstructured data, and even high-dimensional situations, where we have very few data points and many potentially relevant factors.
In this thesis, I tackle the above challenges by developing interpretable statistical machine learning methods to reveal important effects of public policies, to better assess risks in
financial applications, and to quantify market drivers. I study causal inference, statistical model selection, and prediction in different social and economic contexts in order to uncover
statistical relationships and to identify important contributing factors.
In the first part of my work, I analyze financial risk with cryptocurrencies and corporate bonds. For the former, I identify classes of assets and time periods where flexible
machine learning methods, such as random forests employed within a statistical framework, significantly improve predictability of risk. This is vital given the highly volatile
return structure of cryptocurrencies. For corporate bonds, I uncover drivers of the risk of default by developing a method that correctly handles the underlying, highly correlated,
time series data. In the second part, I focus on the evaluation of the causal effect of tuition fees on university student enrollment. I develop methods to deal with the many
possible influencing factors given only few observations by combining subsampling-based methods with regularization in a panel setup. I can show that there
was a causal effect of the short tuition fee period in Germany by disentangling this effect from other factors and policies. In the third part, I combine satellite images with many
noisy, observational data sources to show the impact of crime on the housing market of New York City on a spatial grid. To overcome the endogeneity of crime for house prices,
I develop a method that leverages satellite data, can be easily extended to other cities, and highlights the non-linearity of crime on a spatial level
Discovery of statistical equivalence classes using computer algebra
Discrete statistical models supported on labelled event trees can be
specified using so-called interpolating polynomials which are generalizations
of generating functions. These admit a nested representation. A new algorithm
exploits the primary decomposition of monomial ideals associated with an
interpolating polynomial to quickly compute all nested representations of that
polynomial. It hereby determines an important subclass of all trees
representing the same statistical model. To illustrate this method we analyze
the full polynomial equivalence class of a staged tree representing the best
fitting model inferred from a real-world dataset.Comment: 26 pages, 9 figure
Immigrant players in the national football team of Germany and the question of national identity
This paper is based on the research related to the immigrant players in the national football team and the formation of national identity in Germany. Recent analyses reveal that the success of an immigrant player in the national sports team has been regarded as a useful factor to attract public attention to the contribution of immigrants to the progress of the country. During the matches, discourses coming from the fans depending on the result of the game. They target immigrant players as a scapegoat in the situation of loss. Indeed, this is visible in parallel with the increasing strong critics in the media against these immigrant players. In this paper, the case of Mesut Ă–zil in the German National Football Team is analyzed. The case study offers evidence of whether the success of immigrant players has been an important factor for their inclusion in the national identity in Germany
FAIRness of the mathematical research-data repository MathRepo
MathRepo, located at https://mathrepo.mis.mpg.de, is an online repository for mathematical research data, in particular for code, software, and teaching material. In this talk I will discuss its current content, the role software plays in mathematical research, and future improvements of the repository regarding the FAIR principles
- …