Search CORE

12,394 research outputs found

Efficient Data Representation by Selecting Prototypes with Importance Weights

Author: Aggarwal Charu
Cecchi Guillermo
Dhurandhar Amit
Gurumoorthy Karthik S.
Publication venue
Publication date: 12/08/2019
Field of study

Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method.Comment: Accepted for publication in International Conference on Data Mining (ICDM) 201

arXiv.org e-Print Archive

Crossref

A similarity-based community detection method with multiple prototype representation

Author: Martin Arnaud
Pan Quan
Zhou Kuang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Communities are of great importance for understanding graph structures in social networks. Some existing community detection algorithms use a single prototype to represent each group. In real applications, this may not adequately model the different types of communities and hence limits the clustering performance on social networks. To address this problem, a Similarity-based Multi-Prototype (SMP) community detection approach is proposed in this paper. In SMP, vertices in each community carry various weights to describe their degree of representativeness. This mechanism enables each community to be represented by more than one node. The centrality of nodes is used to calculate prototype weights, while similarity is utilized to guide us to partitioning the graph. Experimental results on computer generated and real-world networks clearly show that SMP performs well for detecting communities. Moreover, the method could provide richer information for the inner structure of the detected communities with the help of prototype weights compared with the existing community detection models

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Adaptive probability scheme for behaviour monitoring of the elderly using a specialised ambient device

Author: Jiang Ping
Winkley Jonathan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/10/2012
Field of study

A Hidden Markov Model (HMM) modified to work in combination with a Fuzzy System is utilised to determine the current behavioural state of the user from information obtained with specialised hardware. Due to the high dimensionality and not-linearly-separable nature of the Fuzzy System and the sensor data obtained with the hardware which informs the state decision, a new method is devised to update the HMM and replace the initial Fuzzy System such that subsequent state decisions are based on the most recent information. The resultant system first reduces the dimensionality of the original information by using a manifold representation in the high dimension which is unfolded in the lower dimension. The data is then linearly separable in the lower dimension where a simple linear classifier, such as the perceptron used here, is applied to determine the probability of the observations belonging to a state. Experiments using the new system verify its applicability in a real scenario

Repository@Hull - Worktribe

Crossref

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

Author: Al-Halah Ziad
Stiefelhagen Rainer
Publication venue
Publication date: 11/04/2017
Field of study

Attribute-based recognition models, due to their impressive performance and their ability to generalize well on novel categories, have been widely adopted for many computer vision applications. However, usually both the attribute vocabulary and the class-attribute associations have to be provided manually by domain experts or large number of annotators. This is very costly and not necessarily optimal regarding recognition performance, and most importantly, it limits the applicability of attribute-based models to large scale data sets. To tackle this problem, we propose an end-to-end unsupervised attribute learning approach. We utilize online text corpora to automatically discover a salient and discriminative vocabulary that correlates well with the human concept of semantic attributes. Moreover, we propose a deep convolutional model to optimize class-attribute associations with a linguistic prior that accounts for noise and missing data in text. In a thorough evaluation on ImageNet, we demonstrate that our model is able to efficiently discover and learn semantic attributes at a large scale. Furthermore, we demonstrate that our model outperforms the state-of-the-art in zero-shot learning on three data sets: ImageNet, Animals with Attributes and aPascal/aYahoo. Finally, we enable attribute-based learning on ImageNet and will share the attributes and associations for future research.Comment: Accepted as a conference paper at CVPR 201

arXiv.org e-Print Archive

Crossref