Search CORE

11,496 research outputs found

Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce

Author: Elgohary Ahmed
Farahat Ahmed K.
Kamel Mohamed S.
Karray Fakhri
Publication venue
Publication date: 29/01/2014
Field of study

The kernel

k

-means is an effective method for data clustering which extends the commonly-used

k

-means algorithm to work on a similarity matrix over complex data structures. The kernel

k

-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel

k

-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. In this paper, we are defining a family of kernel-based low-dimensional embeddings that allows for scaling kernel

k

-means on MapReduce via an efficient and unified parallelization strategy. Afterwards, we propose two methods for low-dimensional embedding that adhere to our definition of the embedding family. Exploiting the proposed parallelization strategy, we present two scalable MapReduce algorithms for kernel

k

-means. We demonstrate the effectiveness and efficiency of the proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data Mining (SDM), 201

arXiv.org e-Print Archive

CiteSeerX

Critical Networks Exhibit Maximal Information Diversity in Structure-Dynamics Relationships

Author: Antti Larjo
B. Luque
Ilya Shmulevich
M. Aldana
M. Li
Matti Nykter
N. H. Packard
Nathan D. Price
Olli Yli-Harja
R. V. Solé
S. A. Kauffman
Stuart A. Kauffman
Tommi Aho
Publication venue: 'American Physical Society (APS)'
Publication date: 23/01/2008
Field of study

Network structure strongly constrains the range of dynamic behaviors available to a complex system. These system dynamics can be classified based on their response to perturbations over time into two distinct regimes, ordered or chaotic, separated by a critical phase transition. Numerous studies have shown that the most complex dynamics arise near the critical regime. Here we use an information theoretic approach to study structure-dynamics relationships within a unified framework and how that these relationships are most diverse in the critical regime

arXiv.org e-Print Archive

Crossref

Training Gaussian Mixture Models at Scale via Coresets

Author: Faulkner Matthew
Feldman Dan
Krause Andreas
Lucic Mario
Publication venue
Publication date: 15/01/2018
Field of study

How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world datasets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error

arXiv.org e-Print Archive

Repository for Publications and Research Data

Caltech Authors