14,115 research outputs found
Sparse Linear Identifiable Multivariate Modeling
In this paper we consider sparse and identifiable linear latent variable
(factor) and linear Bayesian network models for parsimonious analysis of
multivariate data. We propose a computationally efficient method for joint
parameter and model inference, and model comparison. It consists of a fully
Bayesian hierarchy for sparse models using slab and spike priors (two-component
delta-function and continuous mixtures), non-Gaussian latent factors and a
stochastic search over the ordering of the variables. The framework, which we
call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and
bench-marked on artificial and real biological data sets. SLIM is closest in
spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in
inference, Bayesian network structure learning and model comparison.
Experimentally, SLIM performs equally well or better than LiNGAM with
comparable computational complexity. We attribute this mainly to the stochastic
search strategy used, and to parsimony (sparsity and identifiability), which is
an explicit part of the model. We propose two extensions to the basic i.i.d.
linear framework: non-linear dependence on observed variables, called SNIM
(Sparse Non-linear Identifiable Multivariate modeling) and allowing for
correlations between latent variables, called CSLIM (Correlated SLIM), for the
temporal and/or spatial data. The source code and scripts are available from
http://cogsys.imm.dtu.dk/slim/.Comment: 45 pages, 17 figure
Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors
Meaningful quantification of data and structural uncertainties in conceptual rainfall-runoff modeling is a major scientific and engineering challenge. This paper focuses on the total predictive uncertainty and its decomposition into input and structural components under different inference scenarios. Several Bayesian inference schemes are investigated, differing in the treatment of rainfall and structural uncertainties, and in the precision of the priors describing rainfall uncertainty. Compared with traditional lumped additive error approaches, the quantification of the total predictive uncertainty in the runoff is improved when rainfall and/or structural errors are characterized explicitly. However, the decomposition of the total uncertainty into individual sources is more challenging. In particular, poor identifiability may arise when the inference scheme represents rainfall and structural errors using separate probabilistic models. The inference becomes illâposed unless sufficiently precise prior knowledge of data uncertainty is supplied; this illâposedness can often be detected from the behavior of the Monte Carlo sampling algorithm. Moreover, the priors on the data quality must also be sufficiently accurate if the inference is to be reliable and support meaningful uncertainty decomposition. Our findings highlight the inherent limitations of inferring inaccurate hydrologic models using rainfallârunoff data with large unknown errors. Bayesian total error analysis can overcome these problems using independent prior information. The need for deriving independent descriptions of the uncertainties in the input and output data is clearly demonstrated.Benjamin Renard, Dmitri Kavetski, George Kuczera, Mark Thyer, and Stewart W. Frank
The correlation space of Gaussian latent tree models and model selection without fitting
We provide a complete description of possible covariance matrices consistent
with a Gaussian latent tree model for any tree. We then present techniques for
utilising these constraints to assess whether observed data is compatible with
that Gaussian latent tree model. Our method does not require us first to fit
such a tree. We demonstrate the usefulness of the inverse-Wishart distribution
for performing preliminary assessments of tree-compatibility using
semialgebraic constraints. Using results from Drton et al. (2008) we then
provide the appropriate moments required for test statistics for assessing
adherence to these equality constraints. These are shown to be effective even
for small sample sizes and can be easily adjusted to test either the entire
model or only certain macrostructures hypothesized within the tree. We
illustrate our exploratory tetrad analysis using a linguistic application and
our confirmatory tetrad analysis using a biological application.Comment: 15 page
- âŠ