170 research outputs found
Statistical Properties of Convex Clustering
In this manuscript, we study the statistical properties of convex clustering.
We establish that convex clustering is closely related to single linkage
hierarchical clustering and -means clustering. In addition, we derive the
range of tuning parameter for convex clustering that yields a non-trivial
solution. We also provide an unbiased estimate of the degrees of freedom, and
provide a finite sample bound for the prediction error for convex clustering.
We compare convex clustering to some traditional clustering methods in
simulation studies.Comment: 20 pages, 5 figure
Sure Screening for Gaussian Graphical Models
We propose {graphical sure screening}, or GRASS, a very simple and
computationally-efficient screening procedure for recovering the structure of a
Gaussian graphical model in the high-dimensional setting. The GRASS estimate of
the conditional dependence graph is obtained by thresholding the elements of
the sample covariance matrix. The proposed approach possesses the sure
screening property: with very high probability, the GRASS estimated edge set
contains the true edge set. Furthermore, with high probability, the size of the
estimated edge set is controlled. We provide a choice of threshold for GRASS
that can control the expected false positive rate. We illustrate the
performance of GRASS in a simulation study and on a gene expression data set,
and show that in practice it performs quite competitively with more complex and
computationally-demanding techniques for graph estimation
Convex Modeling of Interactions with Strong Heredity
We consider the task of fitting a regression model involving interactions
among a potentially large set of covariates, in which we wish to enforce strong
heredity. We propose FAMILY, a very general framework for this task. Our
proposal is a generalization of several existing methods, such as VANISH
[Radchenko and James, 2010], hierNet [Bien et al., 2013], the all-pairs lasso,
and the lasso using only main effects. It can be formulated as the solution to
a convex optimization problem, which we solve using an efficient alternating
directions method of multipliers (ADMM) algorithm. This algorithm has
guaranteed convergence to the global optimum, can be easily specialized to any
convex penalty function of interest, and allows for a straightforward extension
to the setting of generalized linear models. We derive an unbiased estimator of
the degrees of freedom of FAMILY, and explore its performance in a simulation
study and on an HIV sequence data set.Comment: Final version accepted for publication in JCG
Selection and Estimation for Mixed Graphical Models
We consider the problem of estimating the parameters in a pairwise graphical
model in which the distribution of each node, conditioned on the others, may
have a different parametric form. In particular, we assume that each node's
conditional distribution is in the exponential family. We identify restrictions
on the parameter space required for the existence of a well-defined joint
density, and establish the consistency of the neighbourhood selection approach
for graph reconstruction in high dimensions when the true underlying graph is
sparse. Motivated by our theoretical results, we investigate the selection of
edges between nodes whose conditional distributions take different parametric
forms, and show that efficiency can be gained if edge estimates obtained from
the regressions of particular nodes are used to reconstruct the graph. These
results are illustrated with examples of Gaussian, Bernoulli, Poisson and
exponential distributions. Our theoretical findings are corroborated by
evidence from simulation studies
- …