Search CORE

9,600 research outputs found

Efficient Data Representation by Selecting Prototypes with Importance Weights

Author: Aggarwal Charu
Cecchi Guillermo
Dhurandhar Amit
Gurumoorthy Karthik S.
Publication venue
Publication date: 12/08/2019
Field of study

Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method.Comment: Accepted for publication in International Conference on Data Mining (ICDM) 201

arXiv.org e-Print Archive

Crossref

Activity recognition from videos with parallel hypergraph matching on GPUs

Author: Celiktutan Oya
Lombardi Eric
Sankur Bülent
Wolf Christian
Publication venue
Publication date: 04/05/2015
Field of study

In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph matching algorithm for GPUs. Traditionally, graphs and hypergraphs are frequently used to recognize complex and often non-rigid patterns in computer vision, either through graph matching or point-set matching with graphs. Most formulations resort to the minimization of a difficult discrete energy function mixing geometric or structural terms with data attached terms involving appearance features. Traditional methods solve this minimization problem approximately, for instance with spectral techniques. In this work, instead of solving the problem approximatively, the exact solution for the optimal assignment is calculated in parallel on GPUs. The graphical structure is simplified and regularized, which allows to derive an efficient recursive minimization algorithm. The algorithm distributes subproblems over the calculation units of a GPU, which solves them in parallel, allowing the system to run faster than real-time on medium-end GPUs

arXiv.org e-Print Archive

Hal-Diderot

The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification

Author: Kim Been
Rudin Cynthia
Shah Julie
Publication venue
Publication date: 01/12/2014
Field of study

We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the "quintessential" observations that best represent clusters in a dataset, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.Comment: Published in Neural Information Processing Systems (NIPS) 2014, Neural Information Processing Systems (NIPS) 201

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Prototype selection for parameter estimation in complex models

Author: Freeman Peter E.
Lee Ann B.
Richards Joseph W.
Schafer Chad M.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/03/2012
Field of study

Parameter estimation in astrophysics often requires the use of complex physical models. In this paper we study the problem of estimating the parameters that describe star formation history (SFH) in galaxies. Here, high-dimensional spectral data from galaxies are appropriately modeled as linear combinations of physical components, called simple stellar populations (SSPs), plus some nonlinear distortions. Theoretical data for each SSP is produced for a fixed parameter vector via computer modeling. Though the parameters that define each SSP are continuous, optimizing the signal model over a large set of SSPs on a fine parameter grid is computationally infeasible and inefficient. The goal of this study is to estimate the set of parameters that describes the SFH of each galaxy. These target parameters, such as the average ages and chemical compositions of the galaxy's stellar populations, are derived from the SSP parameters and the component weights in the signal model. Here, we introduce a principled approach of choosing a small basis of SSP prototypes for SFH parameter estimation. The basic idea is to quantize the vector space and effective support of the model components. In addition to greater computational efficiency, we achieve better estimates of the SFH target parameters. In simulations, our proposed quantization method obtains a substantial improvement in estimating the target parameters over the common method of employing a parameter grid. Sparse coding techniques are not appropriate for this problem without proper constraints, while constrained sparse coding methods perform poorly for parameter estimation because their objective is signal reconstruction, not estimation of the target parameters.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Sub-Nyquist Sampling: Bridging Theory and Practice

Author: Eldar Yonina C.
Mishali Moshe
Publication venue
Publication date: 01/01/2011
Field of study

Sampling theory encompasses all aspects related to the conversion of continuous-time signals to discrete streams of numbers. The famous Shannon-Nyquist theorem has become a landmark in the development of digital signal processing. In modern applications, an increasingly number of functions is being pushed forward to sophisticated software algorithms, leaving only those delicate finely-tuned tasks for the circuit level. In this paper, we review sampling strategies which target reduction of the ADC rate below Nyquist. Our survey covers classic works from the early 50's of the previous century through recent publications from the past several years. The prime focus is bridging theory and practice, that is to pinpoint the potential of sub-Nyquist strategies to emerge from the math to the hardware. In that spirit, we integrate contemporary theoretical viewpoints, which study signal modeling in a union of subspaces, together with a taste of practical aspects, namely how the avant-garde modalities boil down to concrete signal processing systems. Our hope is that this presentation style will attract the interest of both researchers and engineers in the hope of promoting the sub-Nyquist premise into practical applications, and encouraging further research into this exciting new frontier.Comment: 48 pages, 18 figures, to appear in IEEE Signal Processing Magazin

arXiv.org e-Print Archive

CiteSeerX