294 research outputs found
Parametric Modelling of Multivariate Count Data Using Probabilistic Graphical Models
Multivariate count data are defined as the number of items of different
categories issued from sampling within a population, which individuals are
grouped into categories. The analysis of multivariate count data is a recurrent
and crucial issue in numerous modelling problems, particularly in the fields of
biology and ecology (where the data can represent, for example, children counts
associated with multitype branching processes), sociology and econometrics. We
focus on I) Identifying categories that appear simultaneously, or on the
contrary that are mutually exclusive. This is achieved by identifying
conditional independence relationships between the variables; II)Building
parsimonious parametric models consistent with these relationships; III)
Characterising and testing the effects of covariates on the joint distribution
of the counts. To achieve these goals, we propose an approach based on
graphical probabilistic models, and more specifically partially directed
acyclic graphs
Sparse Linear Identifiable Multivariate Modeling
In this paper we consider sparse and identifiable linear latent variable
(factor) and linear Bayesian network models for parsimonious analysis of
multivariate data. We propose a computationally efficient method for joint
parameter and model inference, and model comparison. It consists of a fully
Bayesian hierarchy for sparse models using slab and spike priors (two-component
delta-function and continuous mixtures), non-Gaussian latent factors and a
stochastic search over the ordering of the variables. The framework, which we
call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and
bench-marked on artificial and real biological data sets. SLIM is closest in
spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in
inference, Bayesian network structure learning and model comparison.
Experimentally, SLIM performs equally well or better than LiNGAM with
comparable computational complexity. We attribute this mainly to the stochastic
search strategy used, and to parsimony (sparsity and identifiability), which is
an explicit part of the model. We propose two extensions to the basic i.i.d.
linear framework: non-linear dependence on observed variables, called SNIM
(Sparse Non-linear Identifiable Multivariate modeling) and allowing for
correlations between latent variables, called CSLIM (Correlated SLIM), for the
temporal and/or spatial data. The source code and scripts are available from
http://cogsys.imm.dtu.dk/slim/.Comment: 45 pages, 17 figure
Methods for Reconstructing Networks with Incomplete Information.
Network representations of complex systems are widespread and reconstructing unknown networks from data has been intensively researched in statistical and scientific communities more broadly. Two challenges in network reconstruction problems include having insufficient data to illuminate the full structure of the network and needing to combine information from different data sources. Addressing these challenges, this thesis contributes methodology for network reconstruction in three respects.
First, we consider sequentially choosing interventions to discover structure in directed networks focusing on learning a partial order over the nodes. This focus leads to a new model for intervention data under which nodal variables depend on the lengths of paths separating them from intervention targets rather than on parent sets. Taking a Bayesian approach, we present partial-order based priors and develop a novel Markov-Chain Monte Carlo (MCMC) method for computing posterior expectations over directed acyclic graphs. The utility of the MCMC approach comes from designing new proposals for the Metropolis algorithm that move locally among partial orders while independently sampling graphs from each partial order. The resulting Markov Chains mix rapidly and are ergodic. We also adapt an existing strategy for active structure learning, develop an efficient Monte Carlo procedure for estimating the resulting decision function, and evaluate the proposed methods numerically using simulations and benchmark datasets.
We next study penalized likelihood methods using incomplete order information as arising from intervention data. To make the notion of incomplete information precise, we introduce and formally define incomplete partial orders which subsumes the important special case of a known total ordering of the nodes. This special case lies along an information lattice and we study the reconstruction performance of penalized likelihood methods at different points along this lattice.
Finally, we present a method for ranking a network's potential edges using time-course data. The novelty is our development of a nonparametric gradient-matching procedure and a related summary statistic for measuring the strength of relationships among components in dynamic systems. Simulation studies demonstrate that given sufficient signal moving using this procedure to move from linear to additive approximations leads to improved rankings of potential edges.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113316/1/jbhender_1.pd
OCDaf: Ordered Causal Discovery with Autoregressive Flows
We propose OCDaf, a novel order-based method for learning causal graphs from
observational data. We establish the identifiability of causal graphs within
multivariate heteroscedastic noise models, a generalization of additive noise
models that allow for non-constant noise variances. Drawing upon the structural
similarities between these models and affine autoregressive normalizing flows,
we introduce a continuous search algorithm to find causal structures. Our
experiments demonstrate state-of-the-art performance across the Sachs and
SynTReN benchmarks in Structural Hamming Distance (SHD) and Structural
Intervention Distance (SID). Furthermore, we validate our identifiability
theory across various parametric and nonparametric synthetic datasets and
showcase superior performance compared to existing baselines
- …