38 research outputs found

    Learning Big (Image) Data via Coresets for Dictionaries

    Get PDF
    Signal and image processing have seen an explosion of interest in the last few years in a new form of signal/image characterization via the concept of sparsity with respect to a dictionary. An active field of research is dictionary learning: the representation of a given large set of vectors (e.g. signals or images) as linear combinations of only few vectors (patterns). To further reduce the size of the representation, the combinations are usually required to be sparse, i.e., each signal is a linear combination of only a small number of patterns. This paper suggests a new computational approach to the problem of dictionary learning, known in computational geometry as coresets. A coreset for dictionary learning is a small smart non-uniform sample from the input signals such that the quality of any given dictionary with respect to the input can be approximated via the coreset. In particular, the optimal dictionary for the input can be approximated by learning the coreset. Since the coreset is small, the learning is faster. Moreover, using merge-and-reduce, the coreset can be constructed for streaming signals that do not fit in memory and can also be computed in parallel. We apply our coresets for dictionary learning of images using the K-SVD algorithm and bound their size and approximation error analytically. Our simulations demonstrate gain factors of up to 60 in computational time with the same, and even better, performance. We also demonstrate our ability to perform computations on larger patches and high-definition images, where the traditional approach breaks down

    Training Gaussian Mixture Models at Scale via Coresets

    Get PDF
    How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world datasets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error

    Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming

    Full text link
    Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of sketching-based approaches to the sparse dictionary learning and the Euclidean kk-means clustering problems. In particular, we initiate the study of the challenging setting where the dictionary/clustering assignment for each of the nn input points must be output, which has surprisingly received little attention in prior work. On the fast algorithms front, we obtain a new approach for designing PTAS's for the kk-means clustering problem, which generalizes to the first PTAS for the sparse dictionary learning problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and kk-means clustering. In particular, given a design matrix ARn×d\mathbf A\in\mathbb R^{n\times d} in a turnstile stream, we show an O~(nr/ϵ2+dk/ϵ)\tilde O(nr/\epsilon^2 + dk/\epsilon) space upper bound for rr-sparse dictionary learning of size kk, an O~(n/ϵ2+dk/ϵ)\tilde O(n/\epsilon^2 + dk/\epsilon) space upper bound for kk-means clustering, as well as an O~(n)\tilde O(n) space upper bound for kk-means clustering on random order row insertion streams with a natural "bounded sensitivity" assumption. On the lower bounds side, we obtain a general Ω~(n/ϵ+dk/ϵ)\tilde\Omega(n/\epsilon + dk/\epsilon) lower bound for kk-means clustering, as well as an Ω~(n/ϵ2)\tilde\Omega(n/\epsilon^2) lower bound for algorithms which can estimate the cost of a single fixed set of candidate centers.Comment: To appear in NeurIPS 202

    DESIGN OF COMPACT AND DISCRIMINATIVE DICTIONARIES

    Get PDF
    The objective of this research work is to design compact and discriminative dictionaries for e�ective classi�cation. The motivation stems from the fact that dictionaries inherently contain redundant dictionary atoms. This is because the aim of dictionary learning is reconstruction, not classi�cation. In this thesis, we propose methods to obtain minimum number discriminative dictionary atoms for e�ective classi�cation and also reduced computational time. First, we propose a classi�cation scheme where an example is assigned to a class based on the weight assigned to both maximum projection and minimum reconstruction error. Here, the input data is learned by K-SVD dictionary learning which alternates between sparse coding and dictionary update. For sparse coding, orthogonal matching pursuit (OMP) is used and for dictionary update, singular value decomposition is used. This way of classi�cation though e�ective, still there is a scope to improve dictionary learning by removing redundant atoms because our goal is not reconstruction. In order to remove such redundant atoms, we propose two approaches based on information theory to obtain compact discriminative dictionaries. In the �rst approach, we remove redundant atoms from the dictionary while maintaining discriminative information. Speci�cally, we propose a constraint optimization problem which minimizes the mutual information between optimized dictionary and initial dictionary while maximizing mutual information between class labels and optimized dictionary. This helps to determine information loss between before and after the dictionary optimization. To compute information loss, we use Jensen-Shannon diver- gence with adaptive weights to compare class distributions of each dictionary atom. The advantage of Jensen-Shannon divergence is its computational e�ciency rather than calculating information loss from mutual information
    corecore