1,278 research outputs found

    Equations defining probability tree models

    Full text link
    Coloured probability tree models are statistical models coding conditional independence between events depicted in a tree graph. They are more general than the very important class of context-specific Bayesian networks. In this paper, we study the algebraic properties of their ideal of model invariants. The generators of this ideal can be easily read from the tree graph and have a straightforward interpretation in terms of the underlying model: they are differences of odds ratios coming from conditional probabilities. One of the key findings in this analysis is that the tree is a convenient tool for understanding the exact algebraic way in which the sum-to-1 conditions on the parameter space translate into the sum-to-one conditions on the joint probabilities of the statistical model. This enables us to identify necessary and sufficient graphical conditions for a staged tree model to be a toric variety intersected with a probability simplex.Comment: 22 pages, 4 figure

    Equivalence Classes of Staged Trees

    Get PDF
    In this paper we give a complete characterization of the statistical equivalence classes of CEGs and of staged trees. We are able to show that all graphical representations of the same model share a common polynomial description. Then, simple transformations on that polynomial enable us to traverse the corresponding class of graphs. We illustrate our results with a real analysis of the implicit dependence relationships within a previously studied dataset.Comment: 18 pages, 4 figure

    Sensitivity analysis in multilinear probabilistic models

    Get PDF
    Sensitivity methods for the analysis of the outputs of discrete Bayesian networks have been extensively studied and implemented in different software packages. These methods usually focus on the study of sensitivity functions and on the impact of a parameter change to the Chan–Darwiche distance. Although not fully recognized, the majority of these results rely heavily on the multilinear structure of atomic probabilities in terms of the conditional probability parameters associated with this type of network. By defining a statistical model through the polynomial expression of its associated defining conditional probabilities, we develop here a unifying approach to sensitivity methods applicable to a large suite of models including extensions of Bayesian networks, for instance context-specific ones. Our algebraic approach enables us to prove that for models whose defining polynomial is multilinear both the Chan–Darwiche distance and any divergence in the family of ϕ-divergences are minimized for a certain class of multi-parameter contemporaneous variations when parameters are proportionally covaried

    Statistical Model Selection and Prediction for Non-standard Data: Insights and Applications in Economics and Finance

    Get PDF
    In an increasingly digital world, data has become abundant and research about leveraging such vast amounts of data is on the rise. While extracting important information relevant for economic policies or financial risk is crucial, the often non-standard structure of such observational data poses many challenges for researchers. That includes highly correlated, time-dependent data, combinations of unstructured data, and even high-dimensional situations, where we have very few data points and many potentially relevant factors. In this thesis, I tackle the above challenges by developing interpretable statistical machine learning methods to reveal important effects of public policies, to better assess risks in financial applications, and to quantify market drivers. I study causal inference, statistical model selection, and prediction in different social and economic contexts in order to uncover statistical relationships and to identify important contributing factors. In the first part of my work, I analyze financial risk with cryptocurrencies and corporate bonds. For the former, I identify classes of assets and time periods where flexible machine learning methods, such as random forests employed within a statistical framework, significantly improve predictability of risk. This is vital given the highly volatile return structure of cryptocurrencies. For corporate bonds, I uncover drivers of the risk of default by developing a method that correctly handles the underlying, highly correlated, time series data. In the second part, I focus on the evaluation of the causal effect of tuition fees on university student enrollment. I develop methods to deal with the many possible influencing factors given only few observations by combining subsampling-based methods with regularization in a panel setup. I can show that there was a causal effect of the short tuition fee period in Germany by disentangling this effect from other factors and policies. In the third part, I combine satellite images with many noisy, observational data sources to show the impact of crime on the housing market of New York City on a spatial grid. To overcome the endogeneity of crime for house prices, I develop a method that leverages satellite data, can be easily extended to other cities, and highlights the non-linearity of crime on a spatial level

    Discovery of statistical equivalence classes using computer algebra

    Full text link
    Discrete statistical models supported on labelled event trees can be specified using so-called interpolating polynomials which are generalizations of generating functions. These admit a nested representation. A new algorithm exploits the primary decomposition of monomial ideals associated with an interpolating polynomial to quickly compute all nested representations of that polynomial. It hereby determines an important subclass of all trees representing the same statistical model. To illustrate this method we analyze the full polynomial equivalence class of a staged tree representing the best fitting model inferred from a real-world dataset.Comment: 26 pages, 9 figure

    Immigrant players in the national football team of Germany and the question of national identity

    Get PDF
    This paper is based on the research related to the immigrant players in the national football team and the formation of national identity in Germany. Recent analyses reveal that the success of an immigrant player in the national sports team has been regarded as a useful factor to attract public attention to the contribution of immigrants to the progress of the country. During the matches, discourses coming from the fans depending on the result of the game. They target immigrant players as a scapegoat in the situation of loss. Indeed, this is visible in parallel with the increasing strong critics in the media against these immigrant players. In this paper, the case of Mesut Ă–zil in the German National Football Team is analyzed. The case study offers evidence of whether the success of immigrant players has been an important factor for their inclusion in the national identity in Germany

    FAIRness of the mathematical research-data repository MathRepo

    Get PDF
    MathRepo, located at https://mathrepo.mis.mpg.de, is an online repository for mathematical research data, in particular for code, software, and teaching material. In this talk I will discuss its current content, the role software plays in mathematical research, and future improvements of the repository regarding the FAIR principles
    • …
    corecore