951 research outputs found
Training Gaussian Mixture Models at Scale via Coresets
How can we train a statistical mixture model on a massive data set? In this
work we show how to construct coresets for mixtures of Gaussians. A coreset is
a weighted subset of the data, which guarantees that models fitting the coreset
also provide a good fit for the original data set. We show that, perhaps
surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension
and the number of mixture components, while being independent of the data set
size. Hence, one can harness computationally intensive algorithms to compute a
good approximation on a significantly smaller data set. More importantly, such
coresets can be efficiently constructed both in distributed and streaming
settings and do not impose restrictions on the data generating process. Our
results rely on a novel reduction of statistical estimation to problems in
computational geometry and new combinatorial complexity results for mixtures of
Gaussians. Empirical evaluation on several real-world datasets suggests that
our coreset-based approach enables significant reduction in training-time with
negligible approximation error
Acoustic Analysis of Montenegrin English L2 Vowels: Production and Perception
This study provides an acoustic analysis of Montenegrin vowels, in order to make a comparison with the already existing measurements of General American English (GAE) vowels. Also, a production analysis is done on Montenegrin (MTN) learners of English, which shows the vowels that are the most problematic in their L2 pronunciation. In addition to this, a two-way perception study was conducted with the participants. American native English speakers listened to 11 GAE vowels produced by Montenegrin speakers of English, and tried to indicate which vowels they heard, while Montenegrin speakers of English did the same after listening to native GAE speakers. The study shows that some vowels are easy for Montenegrin speakers to produce and perceive. However, certain vowels (e.g., the ones that are present in English, but not in Montenegrin) cause problems for participants in both production and perception analysis. This research helps determine the causes of miscomprehension between native speakers of GAE and Montenegrin EFL learners. These findings can help learners and teachers of ESL/EFL provide better quality instruction for Montenegrin learners by giving them more information on the problematic differences in the vowel systems of Montenegrin and English
- …