Search CORE

15,318 research outputs found

Scaling Large Learning Problems with Hard Parallel Mixtures

Author: Bengio Samy
Bengio Yoshua
Collobert Ronan
Publication venue
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Scaling Large Learning Problems with Hard Parallel Mixtures

Author: M. Haruno
R. A. Cole
R. A. Jacobs
R. A. Jacobs
R. Collobert
S. E. Fahlman
V. Tresp
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Author: Gross Sam
Ranzato Marc'Aurelio
Szlam Arthur
Publication venue
Publication date: 20/04/2017
Field of study

Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks. Mixture of experts models are not new (Jacobs et. al. 1991, Collobert et. al. 2003), but in the past, researchers have had to devise sophisticated methods to deal with data fragmentation. We show empirically that modern weakly supervised data sets are large enough to support naive partitioning schemes where each data point is assigned to a single expert. Because the experts are independent, training them in parallel is easy, and evaluation is cheap for the size of the model. Furthermore, we show that we can use a single decoding layer for all the experts, allowing a unified feature embedding space. We demonstrate that it is feasible (and in fact relatively painless) to train far larger models than could be practically trained with standard CNN architectures, and that the extra capacity can be well used on current datasets.Comment: Appearing in CVPR 201

arXiv.org e-Print Archive

Crossref

Training Gaussian Mixture Models at Scale via Coresets

Author: Faulkner Matthew
Feldman Dan
Krause Andreas
Lucic Mario
Publication venue
Publication date: 15/01/2018
Field of study

How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world datasets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error

arXiv.org e-Print Archive

Repository for Publications and Research Data

Caltech Authors

Blind Source Separation with Compressively Sensed Linear Mixtures

Author: Kleinsteuber Martin
Shen Hao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/10/2011
Field of study

This work studies the problem of simultaneously separating and reconstructing signals from compressively sensed linear mixtures. We assume that all source signals share a common sparse representation basis. The approach combines classical Compressive Sensing (CS) theory with a linear mixing model. It allows the mixtures to be sampled independently of each other. If samples are acquired in the time domain, this means that the sensors need not be synchronized. Since Blind Source Separation (BSS) from a linear mixture is only possible up to permutation and scaling, factoring out these ambiguities leads to a minimization problem on the so-called oblique manifold. We develop a geometric conjugate subgradient method that scales to large systems for solving the problem. Numerical results demonstrate the promising performance of the proposed algorithm compared to several state of the art methods.Comment: 9 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Joint Tensor Factorization and Outlying Slab Suppression with Applications

Author: Bro Rasmus
Fu Xiao
Huang Kejun
Ma Wing-Kin
Sidiropoulos Nicholas D.
Publication venue
Publication date: 01/01/2015
Field of study

We consider factoring low-rank tensors in the presence of outlying slabs. This problem is important in practice, because data collected in many real-world applications, such as speech, fluorescence, and some social network data, fit this paradigm. Prior work tackles this problem by iteratively selecting a fixed number of slabs and fitting, a procedure which may not converge. We formulate this problem from a group-sparsity promoting point of view, and propose an alternating optimization framework to handle the corresponding

\ell_p

(

0<p\leq 1

) minimization-based low-rank tensor factorization problem. The proposed algorithm features a similar per-iteration complexity as the plain trilinear alternating least squares (TALS) algorithm. Convergence of the proposed algorithm is also easy to analyze under the framework of alternating optimization and its variants. In addition, regularization and constraints can be easily incorporated to make use of \emph{a priori} information on the latent loading factors. Simulations and real data experiments on blind speech separation, fluorescence data analysis, and social network mining are used to showcase the effectiveness of the proposed algorithm

arXiv.org e-Print Archive

Copenhagen University Research Information System

Smoothed Analysis of Tensor Decompositions

Author: Anandkumar A.
Anandkumar A.
Anandkumar A.
Arora S.
Belkin M.
Dasgupta S.
Goyal N.
Harshman R.
Horn R.
Hyvärinen A.
Lindsay B.
Stegeman A.
Wedin P.
Publication venue
Publication date: 01/01/2014
Field of study

Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where rank exceeds dimension) is particularly challenging. We introduce a smoothed analysis model for studying these questions and develop an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension). In this setting, we show that our algorithm is robust to inverse polynomial error -- a crucial property for applications in learning since we are only allowed a polynomial number of samples. While algorithms are known for exact tensor decomposition in some overcomplete settings, our main contribution is in analyzing their stability in the framework of smoothed analysis. Our main technical contribution is to show that tensor products of perturbed vectors are linearly independent in a robust sense (i.e. the associated matrix has singular values that are at least an inverse polynomial). This key result paves the way for applying tensor methods to learning problems in the smoothed setting. In particular, we use it to obtain results for learning multi-view models and mixtures of axis-aligned Gaussians where there are many more "components" than dimensions. The assumption here is that the model is not adversarially chosen, formalized by a perturbation of model parameters. We believe this an appealing way to analyze realistic instances of learning problems, since this framework allows us to overcome many of the usual limitations of using tensor methods.Comment: 32 pages (including appendix

arXiv.org e-Print Archive

CiteSeerX

Crossref