5,228 research outputs found
Information-Maximization Clustering based on Squared-Loss Mutual Information
Information-maximization clustering learns a probabilistic classifier in an
unsupervised manner so that mutual information between feature vectors and
cluster assignments is maximized. A notable advantage of this approach is that
it only involves continuous optimization of model parameters, which is
substantially easier to solve than discrete optimization of cluster
assignments. However, existing methods still involve non-convex optimization
problems, and therefore finding a good local optimal solution is not
straightforward in practice. In this paper, we propose an alternative
information-maximization clustering method based on a squared-loss variant of
mutual information. This novel approach gives a clustering solution
analytically in a computationally efficient way via kernel eigenvalue
decomposition. Furthermore, we provide a practical model selection procedure
that allows us to objectively optimize tuning parameters included in the kernel
function. Through experiments, we demonstrate the usefulness of the proposed
approach
Algorithms for Efficient Mining of Statistically Significant Attribute Association Information
Knowledge of the association information between the attributes in a data set
provides insight into the underlying structure of the data and explains the
relationships (independence, synergy, redundancy) between the attributes and
class (if present). Complex models learnt computationally from the data are
more interpretable to a human analyst when such interdependencies are known. In
this paper, we focus on mining two types of association information among the
attributes - correlation information and interaction information for both
supervised (class attribute present) and unsupervised analysis (class attribute
absent). Identifying the statistically significant attribute associations is a
computationally challenging task - the number of possible associations
increases exponentially and many associations contain redundant information
when a number of correlated attributes are present. In this paper, we explore
efficient data mining methods to discover non-redundant attribute sets that
contain significant association information indicating the presence of
informative patterns in the data.Comment: 16 pages, 7 figure
CRAFT: ClusteR-specific Assorted Feature selecTion
We present a framework for clustering with cluster-specific feature
selection. The framework, CRAFT, is derived from asymptotic log posterior
formulations of nonparametric MAP-based clustering models. CRAFT handles
assorted data, i.e., both numeric and categorical data, and the underlying
objective functions are intuitively appealing. The resulting algorithm is
simple to implement and scales nicely, requires minimal parameter tuning,
obviates the need to specify the number of clusters a priori, and compares
favorably with other methods on real datasets
Supervised Dictionary Learning and Sparse Representation-A Review
Dictionary learning and sparse representation (DLSR) is a recent and
successful mathematical model for data representation that achieves
state-of-the-art performance in various fields such as pattern recognition,
machine learning, computer vision, and medical imaging. The original
formulation for DLSR is based on the minimization of the reconstruction error
between the original signal and its sparse representation in the space of the
learned dictionary. Although this formulation is optimal for solving problems
such as denoising, inpainting, and coding, it may not lead to optimal solution
in classification tasks, where the ultimate goal is to make the learned
dictionary and corresponding sparse representation as discriminative as
possible. This motivated the emergence of a new category of techniques, which
is appropriately called supervised dictionary learning and sparse
representation (S-DLSR), leading to more optimal dictionary and sparse
representation in classification tasks. Despite many research efforts for
S-DLSR, the literature lacks a comprehensive view of these techniques, their
connections, advantages and shortcomings. In this paper, we address this gap
and provide a review of the recently proposed algorithms for S-DLSR. We first
present a taxonomy of these algorithms into six categories based on the
approach taken to include label information into the learning of the dictionary
and/or sparse representation. For each category, we draw connections between
the algorithms in this category and present a unified framework for them. We
then provide guidelines for applied researchers on how to represent and learn
the building blocks of an S-DLSR solution based on the problem at hand. This
review provides a broad, yet deep, view of the state-of-the-art methods for
S-DLSR and allows for the advancement of research and development in this
emerging area of research
Feature Selection for multi-labeled variables via Dependency Maximization
Feature selection and reducing the dimensionality of data is an essential
step in data analysis. In this work, we propose a new criterion for feature
selection that is formulated as conditional information between features given
the labeled variable. Instead of using the standard mutual information measure
based on Kullback-Leibler divergence, we use our proposed criterion to filter
out redundant features for the purpose of multiclass classification. This
approach results in an efficient and fast non-parametric implementation of
feature selection as it can be directly estimated using a geometric measure of
dependency, the global Friedman-Rafsky (FR) multivariate run test statistic
constructed by a global minimal spanning tree (MST). We demonstrate the
advantages of our proposed feature selection approach through simulation. In
addition the proposed feature selection method is applied to the MNIST data
set.Comment: 5 pages, 3 Figures, 1 Tabl
Unsupervised Model Selection for Variational Disentangled Representation Learning
Disentangled representations have recently been shown to improve fairness,
data efficiency and generalisation in simple supervised and reinforcement
learning tasks. To extend the benefits of disentangled representations to more
complex domains and practical applications, it is important to enable
hyperparameter tuning and model selection of existing unsupervised approaches
without requiring access to ground truth attribute labels, which are not
available for most datasets. This paper addresses this problem by introducing a
simple yet robust and reliable method for unsupervised disentangled model
selection. Our approach, Unsupervised Disentanglement Ranking (UDR), leverages
the recent theoretical results that explain why variational autoencoders
disentangle (Rolinek et al, 2019), to quantify the quality of disentanglement
by performing pairwise comparisons between trained model representations. We
show that our approach performs comparably to the existing supervised
alternatives across 5,400 models from six state of the art unsupervised
disentangled representation learning model classes. Furthermore, we show that
the ranking produced by our approach correlates well with the final task
performance on two different domains
Easy Transfer Learning By Exploiting Intra-domain Structures
Transfer learning aims at transferring knowledge from a well-labeled domain
to a similar but different domain with limited or no labels. Unfortunately,
existing learning-based methods often involve intensive model selection and
hyperparameter tuning to obtain good results. Moreover, cross-validation is not
possible for tuning hyperparameters since there are often no labels in the
target domain. This would restrict wide applicability of transfer learning
especially in computationally-constraint devices such as wearables. In this
paper, we propose a practically Easy Transfer Learning (EasyTL) approach which
requires no model selection and hyperparameter tuning, while achieving
competitive performance. By exploiting intra-domain structures, EasyTL is able
to learn both non-parametric transfer features and classifiers. Extensive
experiments demonstrate that, compared to state-of-the-art traditional and deep
methods, EasyTL satisfies the Occam's Razor principle: it is extremely easy to
implement and use while achieving comparable or better performance in
classification accuracy and much better computational efficiency. Additionally,
it is shown that EasyTL can increase the performance of existing transfer
feature learning methods.Comment: Camera-ready version of IEEE International Conference on Multimedia
and Expo (ICME) 2019; code available at
http://transferlearning.xyz/code/traditional/EasyT
Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation
We study the problem of unsupervised domain adaptation, which aims to adapt
classifiers trained on a labeled source domain to an unlabeled target domain.
Many existing approaches first learn domain-invariant features and then
construct classifiers with them. We propose a novel approach that jointly learn
the both. Specifically, while the method identifies a feature space where data
in the source and the target domains are similarly distributed, it also learns
the feature space discriminatively, optimizing an information-theoretic metric
as an proxy to the expected misclassification error on the target domain. We
show how this optimization can be effectively carried out with simple
gradient-based methods and how hyperparameters can be cross-validated without
demanding any labeled data from the target domain. Empirical studies on
benchmark tasks of object recognition and sentiment analysis validated our
modeling assumptions and demonstrated significant improvement of our method
over competing ones in classification accuracies.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
Pattern analysis often requires a pre-processing stage for extracting or
selecting features in order to help the classification, prediction, or
clustering stage discriminate or represent the data in a better way. The reason
for this requirement is that the raw data are complex and difficult to process
without extracting or selecting appropriate features beforehand. This paper
reviews theory and motivation of different common methods of feature selection
and extraction and introduces some of their applications. Some numerical
implementations are also shown for these methods. Finally, the methods in
feature selection and extraction are compared.Comment: 14 pages, 1 figure, 2 tables, survey (literature review) pape
Survey of state-of-the-art mixed data clustering algorithms
Mixed data comprises both numeric and categorical features, and mixed
datasets occur frequently in many domains, such as health, finance, and
marketing. Clustering is often applied to mixed datasets to find structures and
to group similar objects for further analysis. However, clustering mixed data
is challenging because it is difficult to directly apply mathematical
operations, such as summation or averaging, to the feature values of these
datasets. In this paper, we present a taxonomy for the study of mixed data
clustering algorithms by identifying five major research themes. We then
present a state-of-the-art review of the research works within each research
theme. We analyze the strengths and weaknesses of these methods with pointers
for future research directions. Lastly, we present an in-depth analysis of the
overall challenges in this field, highlight open research questions and discuss
guidelines to make progress in the field.Comment: 20 Pages, 2 columns, 6 Tables, 209 Reference
- …