5,935 research outputs found
Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation
Across a variety of scientific disciplines, sparse inverse covariance
estimation is a popular tool for capturing the underlying dependency
relationships in multivariate data. Unfortunately, most estimators are not
scalable enough to handle the sizes of modern high-dimensional data sets (often
on the order of terabytes), and assume Gaussian samples. To address these
deficiencies, we introduce HP-CONCORD, a highly scalable optimization method
for estimating a sparse inverse covariance matrix based on a regularized
pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal
gradient method uses a novel communication-avoiding linear algebra algorithm
and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving
parallel scalability on problems with up to ~819 billion parameters (1.28
million dimensions); even on a single node, HP-CONCORD demonstrates
scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to
estimate the underlying dependency structure of the brain from fMRI data, and
use the result to identify functional regions automatically. The results show
good agreement with a clustering from the neuroscience literature.Comment: Main paper: 15 pages, appendix: 24 page
The State-of-the-Art of Set Visualization
Sets comprise a generic data model that has been used in a variety of data analysis problems. Such problems involve analysing and visualizing set relations between multiple sets defined over the same collection of elements. However, visualizing sets is a non-trivial problem due to the large number of possible relations between them. We provide a systematic overview of state-of-the-art techniques for visualizing different kinds of set relations. We classify these techniques into six main categories according to the visual representations they use and the tasks they support. We compare the categories to provide guidance for choosing an appropriate technique for a given problem. Finally, we identify challenges in this area that need further research and propose possible directions to address these challenges. Further resources on set visualization are available at http://www.setviz.net
- …