6,789 research outputs found
Parametric Modelling of Multivariate Count Data Using Probabilistic Graphical Models
Multivariate count data are defined as the number of items of different
categories issued from sampling within a population, which individuals are
grouped into categories. The analysis of multivariate count data is a recurrent
and crucial issue in numerous modelling problems, particularly in the fields of
biology and ecology (where the data can represent, for example, children counts
associated with multitype branching processes), sociology and econometrics. We
focus on I) Identifying categories that appear simultaneously, or on the
contrary that are mutually exclusive. This is achieved by identifying
conditional independence relationships between the variables; II)Building
parsimonious parametric models consistent with these relationships; III)
Characterising and testing the effects of covariates on the joint distribution
of the counts. To achieve these goals, we propose an approach based on
graphical probabilistic models, and more specifically partially directed
acyclic graphs
Graphs for margins of Bayesian networks
Directed acyclic graph (DAG) models, also called Bayesian networks, impose
conditional independence constraints on a multivariate probability
distribution, and are widely used in probabilistic reasoning, machine learning
and causal inference. If latent variables are included in such a model, then
the set of possible marginal distributions over the remaining (observed)
variables is generally complex, and not represented by any DAG. Larger classes
of mixed graphical models, which use multiple edge types, have been introduced
to overcome this; however, these classes do not represent all the models which
can arise as margins of DAGs. In this paper we show that this is because
ordinary mixed graphs are fundamentally insufficiently rich to capture the
variety of marginal models.
We introduce a new class of hyper-graphs, called mDAGs, and a latent
projection operation to obtain an mDAG from the margin of a DAG. We show that
each distinct marginal of a DAG model is represented by at least one mDAG, and
provide graphical results towards characterizing when two such marginal models
are the same. Finally we show that mDAGs correctly capture the marginal
structure of causally-interpreted DAGs under interventions on the observed
variables
Sequences of regressions and their independences
Ordered sequences of univariate or multivariate regressions provide
statistical models for analysing data from randomized, possibly sequential
interventions, from cohort or multi-wave panel studies, but also from
cross-sectional or retrospective studies. Conditional independences are
captured by what we name regression graphs, provided the generated distribution
shares some properties with a joint Gaussian distribution. Regression graphs
extend purely directed, acyclic graphs by two types of undirected graph, one
type for components of joint responses and the other for components of the
context vector variable. We review the special features and the history of
regression graphs, derive criteria to read all implied independences of a
regression graph and prove criteria for Markov equivalence that is to judge
whether two different graphs imply the same set of independence statements.
Knowledge of Markov equivalence provides alternative interpretations of a given
sequence of regressions, is essential for machine learning strategies and
permits to use the simple graphical criteria of regression graphs on graphs for
which the corresponding criteria are in general more complex. Under the known
conditions that a Markov equivalent directed acyclic graph exists for any given
regression graph, we give a polynomial time algorithm to find one such graph.Comment: 43 pages with 17 figures The manuscript is to appear as an invited
discussion paper in the journal TES
- …