12,394 research outputs found
Efficient Data Representation by Selecting Prototypes with Importance Weights
Prototypical examples that best summarizes and compactly represents an
underlying complex data distribution communicate meaningful insights to humans
in domains where simple explanations are hard to extract. In this paper we
present algorithms with strong theoretical guarantees to mine these data sets
and select prototypes a.k.a. representatives that optimally describes them. Our
work notably generalizes the recent work by Kim et al. (2016) where in addition
to selecting prototypes, we also associate non-negative weights which are
indicative of their importance. This extension provides a single coherent
framework under which both prototypes and criticisms (i.e. outliers) can be
found. Furthermore, our framework works for any symmetric positive definite
kernel thus addressing one of the key open questions laid out in Kim et al.
(2016). By establishing that our objective function enjoys a key property of
that of weak submodularity, we present a fast ProtoDash algorithm and also
derive approximation guarantees for the same. We demonstrate the efficacy of
our method on diverse domains such as retail, digit recognition (MNIST) and on
publicly available 40 health questionnaires obtained from the Center for
Disease Control (CDC) website maintained by the US Dept. of Health. We validate
the results quantitatively as well as qualitatively based on expert feedback
and recently published scientific studies on public health, thus showcasing the
power of our technique in providing actionability (for retail), utility (for
MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an
effective data mining method.Comment: Accepted for publication in International Conference on Data Mining
(ICDM) 201
A similarity-based community detection method with multiple prototype representation
Communities are of great importance for understanding graph structures in
social networks. Some existing community detection algorithms use a single
prototype to represent each group. In real applications, this may not
adequately model the different types of communities and hence limits the
clustering performance on social networks. To address this problem, a
Similarity-based Multi-Prototype (SMP) community detection approach is proposed
in this paper. In SMP, vertices in each community carry various weights to
describe their degree of representativeness. This mechanism enables each
community to be represented by more than one node. The centrality of nodes is
used to calculate prototype weights, while similarity is utilized to guide us
to partitioning the graph. Experimental results on computer generated and
real-world networks clearly show that SMP performs well for detecting
communities. Moreover, the method could provide richer information for the
inner structure of the detected communities with the help of prototype weights
compared with the existing community detection models
Adaptive probability scheme for behaviour monitoring of the elderly using a specialised ambient device
A Hidden Markov Model (HMM) modified to work in combination with a Fuzzy System is utilised to determine the current behavioural state of the user from information obtained with specialised hardware. Due to the high dimensionality and not-linearly-separable nature of the Fuzzy System and the sensor data obtained with the hardware which informs the state decision, a new method is devised to update the HMM and replace the initial Fuzzy System such that subsequent state decisions are based on the most recent information. The resultant system first reduces the dimensionality of the original information by using a manifold representation in the high dimension which is unfolded in the lower dimension. The data is then linearly separable in the lower dimension where a simple linear classifier, such as the perceptron used here, is applied to determine the probability of the observations belonging to a state. Experiments using the new system verify its applicability in a real scenario
Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories
Attribute-based recognition models, due to their impressive performance and
their ability to generalize well on novel categories, have been widely adopted
for many computer vision applications. However, usually both the attribute
vocabulary and the class-attribute associations have to be provided manually by
domain experts or large number of annotators. This is very costly and not
necessarily optimal regarding recognition performance, and most importantly, it
limits the applicability of attribute-based models to large scale data sets. To
tackle this problem, we propose an end-to-end unsupervised attribute learning
approach. We utilize online text corpora to automatically discover a salient
and discriminative vocabulary that correlates well with the human concept of
semantic attributes. Moreover, we propose a deep convolutional model to
optimize class-attribute associations with a linguistic prior that accounts for
noise and missing data in text. In a thorough evaluation on ImageNet, we
demonstrate that our model is able to efficiently discover and learn semantic
attributes at a large scale. Furthermore, we demonstrate that our model
outperforms the state-of-the-art in zero-shot learning on three data sets:
ImageNet, Animals with Attributes and aPascal/aYahoo. Finally, we enable
attribute-based learning on ImageNet and will share the attributes and
associations for future research.Comment: Accepted as a conference paper at CVPR 201
- …