13,633 research outputs found
Penalized EM algorithm and copula skeptic graphical models for inferring networks for mixed variables
In this article, we consider the problem of reconstructing networks for
continuous, binary, count and discrete ordinal variables by estimating sparse
precision matrix in Gaussian copula graphical models. We propose two
approaches: penalized extended rank likelihood with Monte Carlo
Expectation-Maximization algorithm (copula EM glasso) and copula skeptic with
pair-wise copula estimation for copula Gaussian graphical models. The proposed
approaches help to infer networks arising from nonnormal and mixed variables.
We demonstrate the performance of our methods through simulation studies and
analysis of breast cancer genomic and clinical data and maize genetics data
Generalized Network Psychometrics: Combining Network and Latent Variable Models
We introduce the network model as a formal psychometric model,
conceptualizing the covariance between psychometric indicators as resulting
from pairwise interactions between observable variables in a network structure.
This contrasts with standard psychometric models, in which the covariance
between test items arises from the influence of one or more common latent
variables. Here, we present two generalizations of the network model that
encompass latent variable structures, establishing network modeling as parts of
the more general framework of Structural Equation Modeling (SEM). In the first
generalization, we model the covariance structure of latent variables as a
network. We term this framework Latent Network Modeling (LNM) and show that,
with LNM, a unique structure of conditional independence relationships between
latent variables can be obtained in an explorative manner. In the second
generalization, the residual variance-covariance structure of indicators is
modeled as a network. We term this generalization Residual Network Modeling
(RNM) and show that, within this framework, identifiable models can be obtained
in which local independence is structurally violated. These generalizations
allow for a general modeling framework that can be used to fit, and compare,
SEM models, network models, and the RNM and LNM generalizations. This
methodology has been implemented in the free-to-use software package lvnet,
which contains confirmatory model testing as well as two exploratory search
algorithms: stepwise search algorithms for low-dimensional datasets and
penalized maximum likelihood estimation for larger datasets. We show in
simulation studies that these search algorithms performs adequately in
identifying the structure of the relevant residual or latent networks. We
further demonstrate the utility of these generalizations in an empirical
example on a personality inventory dataset.Comment: Published in Psychometrik
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
- …