Search CORE

13 research outputs found

Training Gaussian Mixture Models at Scale via Coresets

Author: Faulkner Matthew
Feldman Dan
Krause Andreas
Lucic Mario
Publication venue
Publication date: 15/01/2018
Field of study

How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world datasets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error

arXiv.org e-Print Archive

Repository for Publications and Research Data

Caltech Authors

Real-time data exploitation supported by model- and event-driven architecture to enhance situation awareness, application to crisis management

Author: Barthe Anne-Marie
Bénaben Frédérick
Fertier Audrey
Montarnal Aurélie
Truptil Sébastien
Publication venue: 'Informa UK Limited'
Publication date: 29/11/2019
Field of study

An effective crisis response requires up-to-date information. The crisis cell must reach for new, external, data sources. However, new data lead to new issues: their volume, veracity, variety or velocity cannot be managed by humans only, especially under high stress and time pressure. This paper proposes (i) a framework to enhance situation awareness while managing the 5Vs of Big Data, (ii) general principles to be followed and (iii) a new architecture implementing the proposed framework. The latter merges event-driven and model-driven architectures. It has been tested on a realistic flood scenario set up by official French services

Open Archive Toulouse Archive Ouverte

Intelligent Reference Curation for Visual Place Recognition via Bayesian Selective Fusion

Author: Fischer Tobias
Milford Michael
Molloy Timothy L.
Nair Girish N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/01/2021
Field of study

A key challenge in visual place recognition (VPR) is recognizing places despite drastic visual appearance changes due to factors such as time of day, season, weather or lighting conditions. Numerous approaches based on deep-learnt image descriptors, sequence matching, domain translation, and probabilistic localization have had success in addressing this challenge, but most rely on the availability of carefully curated representative reference images of the possible places. In this paper, we propose a novel approach, dubbed Bayesian Selective Fusion, for actively selecting and fusing informative reference images to determine the best place match for a given query image. The selective element of our approach avoids the counterproductive fusion of every reference image and enables the dynamic selection of informative reference images in environments with changing visual conditions (such as indoors with flickering lights, outdoors during sunshowers or over the day-night cycle). The probabilistic element of our approach provides a means of fusing multiple reference images that accounts for their varying uncertainty via a novel training-free likelihood function for VPR. On difficult query images from two benchmark datasets, we demonstrate that our approach matches and exceeds the performance of several alternative fusion approaches along with state-of-the-art techniques that are provided with prior (unfair) knowledge of the best reference images. Our approach is well suited for long-term robot autonomy where dynamic visual environments are commonplace since it is training-free, descriptor-agnostic, and complements existing techniques such as sequence matching.Comment: 8 pages, 10 figures, accepted in the IEEE Robotics and Automation Letter

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Coresets for Visual Summarization with Applications to Loop Closure

Author: Dan Feldman
Daniela Rus
Guy Rosman
John W Fisher Iii
Mikhail Volkov
Publication venue
Publication date: 02/04/2020
Field of study

Abstract-In continuously operating robotic systems, efficient representation of the previously seen camera feed is crucial. Using a highly efficient compression coreset method, we formulate a new method for hierarchical retrieval of frames from large video streams collected online by a moving robot. We demonstrate how to utilize the resulting structure for efficient loop-closure by a novel sampling approach that is adaptive to the structure of the video. The same structure also allows us to create a highly-effective search tool for large-scale videos, which we demonstrate in this paper. We show the efficiency of proposed approaches for retrieval and loop closure on standard datasets, and on a large-scale video from a mobile camera

CiteSeerX

Coresets for Time Series Clustering

Author: huang Lingxiao
Sudhir K.
Vishnoi Nisheeth
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 28/10/2021
Field of study

We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many ﬁelds including biology, medicine, and economics due to the proliferation of sensors for real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on N entities is generated from a Gaussian mixture model with autocorrelations over k clusters in Rd. Our main contribution is an algorithm to construct coresets for the maximum likelihood objective for this mixture model. Our algorithm is eﬀicient, and, under a mild assumption on the covariance matrices of the Gaussians, the size of the coreset is independent of the number of entities N and the number of observations for each entity, and depends only polynomially on k, d and 1/ε, where ε is the error parameter. We empirically assess the performance of our coresets with synthetic data

arXiv.org e-Print Archive

Yale University

Coresets for Time Series Clustering

Author: huang Lingxiao
Sudhir K.
Vishnoi Nisheeth
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/11/2021
Field of study

Yale University