43,713 research outputs found
Block-diagonal covariance selection for high-dimensional Gaussian graphical models
Gaussian graphical models are widely utilized to infer and visualize networks
of dependencies between continuous variables. However, inferring the graph is
difficult when the sample size is small compared to the number of variables. To
reduce the number of parameters to estimate in the model, we propose a
non-asymptotic model selection procedure supported by strong theoretical
guarantees based on an oracle inequality and a minimax lower bound. The
covariance matrix of the model is approximated by a block-diagonal matrix. The
structure of this matrix is detected by thresholding the sample covariance
matrix, where the threshold is selected using the slope heuristic. Based on the
block-diagonal structure of the covariance matrix, the estimation problem is
divided into several independent problems: subsequently, the network of
dependencies between variables is inferred using the graphical lasso algorithm
in each block. The performance of the procedure is illustrated on simulated
data. An application to a real gene expression dataset with a limited sample
size is also presented: the dimension reduction allows attention to be
objectively focused on interactions among smaller subsets of genes, leading to
a more parsimonious and interpretable modular network.Comment: Accepted in JAS
Minimum Density Hyperplanes
Associating distinct groups of objects (clusters) with contiguous regions of
high probability density (high-density clusters), is central to many
statistical and machine learning approaches to the classification of unlabelled
data. We propose a novel hyperplane classifier for clustering and
semi-supervised classification which is motivated by this objective. The
proposed minimum density hyperplane minimises the integral of the empirical
probability density function along it, thereby avoiding intersection with high
density clusters. We show that the minimum density and the maximum margin
hyperplanes are asymptotically equivalent, thus linking this approach to
maximum margin clustering and semi-supervised support vector classifiers. We
propose a projection pursuit formulation of the associated optimisation problem
which allows us to find minimum density hyperplanes efficiently in practice,
and evaluate its performance on a range of benchmark datasets. The proposed
approach is found to be very competitive with state of the art methods for
clustering and semi-supervised classification
- …