116,472 research outputs found
Graphical continuous Lyapunov models
The linear Lyapunov equation of a covariance matrix parametrizes the
equilibrium covariance matrix of a stochastic process. This parametrization can
be interpreted as a new graphical model class, and we show how the model class
behaves under marginalization and introduce a method for structure learning via
-penalized loss minimization. Our proposed method is demonstrated to
outperform alternative structure learning algorithms in a simulation study, and
we illustrate its application for protein phosphorylation network
reconstruction.Comment: 10 pages, 5 figure
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
mgm: Estimating Time-Varying Mixed Graphical Models in High-Dimensional Data
We present the R-package mgm for the estimation of k-order Mixed Graphical
Models (MGMs) and mixed Vector Autoregressive (mVAR) models in high-dimensional
data. These are a useful extensions of graphical models for only one variable
type, since data sets consisting of mixed types of variables (continuous,
count, categorical) are ubiquitous. In addition, we allow to relax the
stationarity assumption of both models by introducing time-varying versions
MGMs and mVAR models based on a kernel weighting approach. Time-varying models
offer a rich description of temporally evolving systems and allow to identify
external influences on the model structure such as the impact of interventions.
We provide the background of all implemented methods and provide fully
reproducible examples that illustrate how to use the package
Selection and Estimation for Mixed Graphical Models
We consider the problem of estimating the parameters in a pairwise graphical
model in which the distribution of each node, conditioned on the others, may
have a different parametric form. In particular, we assume that each node's
conditional distribution is in the exponential family. We identify restrictions
on the parameter space required for the existence of a well-defined joint
density, and establish the consistency of the neighbourhood selection approach
for graph reconstruction in high dimensions when the true underlying graph is
sparse. Motivated by our theoretical results, we investigate the selection of
edges between nodes whose conditional distributions take different parametric
forms, and show that efficiency can be gained if edge estimates obtained from
the regressions of particular nodes are used to reconstruct the graph. These
results are illustrated with examples of Gaussian, Bernoulli, Poisson and
exponential distributions. Our theoretical findings are corroborated by
evidence from simulation studies
- …