17,806 research outputs found
Optimal Principal Component Analysis in Distributed and Streaming Models
We study the Principal Component Analysis (PCA) problem in the distributed
and streaming models of computation. Given a matrix a
rank parameter , and an accuracy parameter , we
want to output an orthonormal matrix for which where is the best rank- approximation to .
This paper provides improved algorithms for distributed PCA and streaming
PCA.Comment: STOC2016 full versio
Training Gaussian Mixture Models at Scale via Coresets
How can we train a statistical mixture model on a massive data set? In this
work we show how to construct coresets for mixtures of Gaussians. A coreset is
a weighted subset of the data, which guarantees that models fitting the coreset
also provide a good fit for the original data set. We show that, perhaps
surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension
and the number of mixture components, while being independent of the data set
size. Hence, one can harness computationally intensive algorithms to compute a
good approximation on a significantly smaller data set. More importantly, such
coresets can be efficiently constructed both in distributed and streaming
settings and do not impose restrictions on the data generating process. Our
results rely on a novel reduction of statistical estimation to problems in
computational geometry and new combinatorial complexity results for mixtures of
Gaussians. Empirical evaluation on several real-world datasets suggests that
our coreset-based approach enables significant reduction in training-time with
negligible approximation error
Outlier detection techniques for wireless sensor networks: A survey
In the field of wireless sensor networks, those measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a comparative table to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier identity, and outlier degree
- …