Search CORE

3,147 research outputs found

The geometry of kernelized spectral clustering

Author: Schiebinger Geoffrey
Wainwright Martin J.
Yu Bin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap between mixture components and how easily a mixture component is divided into two nonoverlapping components. When the overlap is small compared to the indivisibility of the mixture components, the principal eigenspace of the population-level normalized Laplacian operator is approximately spanned by the square-root kernelized component densities. In the finite sample setting, and under the same assumption, embedded samples from different components are approximately orthogonal with high probability when the sample size is large. As a corollary we control the fraction of samples mislabeled by spectral clustering under finite mixtures with nonparametric components.Comment: Published at http://dx.doi.org/10.1214/14-AOS1283 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Latent Fisher Discriminant Analysis

Author: Chen Gang
Publication venue
Publication date: 20/09/2013
Field of study

Linear Discriminant Analysis (LDA) is a well-known method for dimensionality reduction and classification. Previous studies have also extended the binary-class case into multi-classes. However, many applications, such as object detection and keyframe extraction cannot provide consistent instance-label pairs, while LDA requires labels on instance level for training. Thus it cannot be directly applied for semi-supervised classification problem. In this paper, we overcome this limitation and propose a latent variable Fisher discriminant analysis model. We relax the instance-level labeling into bag-level, is a kind of semi-supervised (video-level labels of event type are required for semantic frame extraction) and incorporates a data-driven prior over the latent variables. Hence, our method combines the latent variable inference and dimension reduction in an unified bayesian framework. We test our method on MUSK and Corel data sets and yield competitive results compared to the baseline approach. We also demonstrate its capacity on the challenging TRECVID MED11 dataset for semantic keyframe extraction and conduct a human-factors ranking-based experimental evaluation, which clearly demonstrates our proposed method consistently extracts more semantically meaningful keyframes than challenging baselines.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Bi-Objective Nonnegative Matrix Factorization: Linear Versus Kernel-Based Models

Author: Honeine Paul
Zhu Fei
Publication venue
Publication date: 22/01/2015
Field of study

Nonnegative matrix factorization (NMF) is a powerful class of feature extraction techniques that has been successfully applied in many fields, namely in signal and image processing. Current NMF techniques have been limited to a single-objective problem in either its linear or nonlinear kernel-based formulation. In this paper, we propose to revisit the NMF as a multi-objective problem, in particular a bi-objective one, where the objective functions defined in both input and feature spaces are taken into account. By taking the advantage of the sum-weighted method from the literature of multi-objective optimization, the proposed bi-objective NMF determines a set of nondominated, Pareto optimal, solutions instead of a single optimal decomposition. Moreover, the corresponding Pareto front is studied and approximated. Experimental results on unmixing real hyperspectral images confirm the efficiency of the proposed bi-objective NMF compared with the state-of-the-art methods

arXiv.org e-Print Archive

HAL - Normandie Université

Optimal Transport for Kernel Gaussian Mixture Models

Author: Deasy Joseph O
Elkin Rena
Oh Jung Hun
Simhal Anish Kumar
Tannenbaum Allen
Zhu Jiening
Publication venue
Publication date: 28/10/2023
Field of study

The Wasserstein distance from optimal mass transport (OMT) is a powerful mathematical tool with numerous applications that provides a natural measure of the distance between two probability distributions. Several methods to incorporate OMT into widely used probabilistic models, such as Gaussian or Gaussian mixture, have been developed to enhance the capability of modeling complex multimodal densities of real datasets. However, very few studies have explored the OMT problems in a reproducing kernel Hilbert space (RKHS), wherein the kernel trick is utilized to avoid the need to explicitly map input data into a high-dimensional feature space. In the current study, we propose a Wasserstein-type metric to compute the distance between two Gaussian mixtures in a RKHS via the kernel trick, i.e., kernel Gaussian mixture models.Comment: 17 pages, 5 figures, 2 table

arXiv.org e-Print Archive