Search CORE

14,111 research outputs found

Efficient Data Representation by Selecting Prototypes with Importance Weights

Author: Aggarwal Charu
Cecchi Guillermo
Dhurandhar Amit
Gurumoorthy Karthik S.
Publication venue
Publication date: 12/08/2019
Field of study

Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method.Comment: Accepted for publication in International Conference on Data Mining (ICDM) 201

arXiv.org e-Print Archive

Crossref

Spectral Unmixing with Multiple Dictionaries

Author: Cohen Jeremy E.
Gillis Nicolas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/11/2017
Field of study

Spectral unmixing aims at recovering the spectral signatures of materials, called endmembers, mixed in a hyperspectral or multispectral image, along with their abundances. A typical assumption is that the image contains one pure pixel per endmember, in which case spectral unmixing reduces to identifying these pixels. Many fully automated methods have been proposed in recent years, but little work has been done to allow users to select areas where pure pixels are present manually or using a segmentation algorithm. Additionally, in a non-blind approach, several spectral libraries may be available rather than a single one, with a fixed number (or an upper or lower bound) of endmembers to chose from each. In this paper, we propose a multiple-dictionary constrained low-rank matrix approximation model that address these two problems. We propose an algorithm to compute this model, dubbed M2PALS, and its performance is discussed on both synthetic and real hyperspectral images

arXiv.org e-Print Archive

Correntropy Maximization via ADMM - Application to Robust Hyperspectral Unmixing

Author: Chen Badong
Halimi Abderrahim
Honeine Paul
Zheng Nanning
Zhu Fei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/02/2016
Field of study

In hyperspectral images, some spectral bands suffer from low signal-to-noise ratio due to noisy acquisition and atmospheric effects, thus requiring robust techniques for the unmixing problem. This paper presents a robust supervised spectral unmixing approach for hyperspectral images. The robustness is achieved by writing the unmixing problem as the maximization of the correntropy criterion subject to the most commonly used constraints. Two unmixing problems are derived: the first problem considers the fully-constrained unmixing, with both the non-negativity and sum-to-one constraints, while the second one deals with the non-negativity and the sparsity-promoting of the abundances. The corresponding optimization problems are solved efficiently using an alternating direction method of multipliers (ADMM) approach. Experiments on synthetic and real hyperspectral images validate the performance of the proposed algorithms for different scenarios, demonstrating that the correntropy-based unmixing is robust to outlier bands.Comment: 23 page

arXiv.org e-Print Archive

HAL - Normandie Université

Heriot Watt Pure

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

Author: Bach F.
Cortes C.
Cover T. M.
Eric P. Xing
Fukumizu K.
Leonid Sigal
Li F.
Liu H.
Makoto Yamada
Masaeli M.
Masashi Sugiyama
Nocedal J.
Raskutti G.
Rodriguez-Lujan I.
Schölkopf B.
Seeger M.
Song L.
Tibshirani R.
Tomioka R.
Wittawat Jitkrittum
Xing E. P.
Zhao Z.
Publication venue: 'MIT Press - Journals'
Publication date: 03/01/2019
Field of study

The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments with thousands of features.Comment: 18 page

arXiv.org e-Print Archive

Crossref