Search CORE

951 research outputs found

Training Gaussian Mixture Models at Scale via Coresets

Author: Faulkner Matthew
Feldman Dan
Krause Andreas
Lucic Mario
Publication venue
Publication date: 15/01/2018
Field of study

How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world datasets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error

arXiv.org e-Print Archive

Repository for Publications and Research Data

Caltech Authors

Dimension reduction problems in the modelling of hydrogel thin films

Author: Lucic Danka
Publication venue: place:Trieste
Publication date: 26/09/2018
Field of study

Sissa Digital Library

Acoustic Analysis of Montenegrin English L2 Vowels: Production and Perception

Author: Lucic Ivana
Publication venue: The Repository at St. Cloud State
Publication date: 22/04/2015
Field of study

This study provides an acoustic analysis of Montenegrin vowels, in order to make a comparison with the already existing measurements of General American English (GAE) vowels. Also, a production analysis is done on Montenegrin (MTN) learners of English, which shows the vowels that are the most problematic in their L2 pronunciation. In addition to this, a two-way perception study was conducted with the participants. American native English speakers listened to 11 GAE vowels produced by Montenegrin speakers of English, and tried to indicate which vowels they heard, while Montenegrin speakers of English did the same after listening to native GAE speakers. The study shows that some vowels are easy for Montenegrin speakers to produce and perceive. However, certain vowels (e.g., the ones that are present in English, but not in Montenegrin) cause problems for participants in both production and perception analysis. This research helps determine the causes of miscomprehension between native speakers of GAE and Montenegrin EFL learners. These findings can help learners and teachers of ESL/EFL provide better quality instruction for Montenegrin learners by giving them more information on the problematic differences in the vowel systems of Montenegrin and English

St. Cloud State University