7,427 research outputs found
Unsupervised High-level Feature Learning by Ensemble Projection for Semi-supervised Image Classification and Image Clustering
This paper investigates the problem of image classification with limited or
no annotations, but abundant unlabeled data. The setting exists in many tasks
such as semi-supervised image classification, image clustering, and image
retrieval. Unlike previous methods, which develop or learn sophisticated
regularizers for classifiers, our method learns a new image representation by
exploiting the distribution patterns of all available data for the task at
hand. Particularly, a rich set of visual prototypes are sampled from all
available data, and are taken as surrogate classes to train discriminative
classifiers; images are projected via the classifiers; the projected values,
similarities to the prototypes, are stacked to build the new feature vector.
The training set is noisy. Hence, in the spirit of ensemble learning we create
a set of such training sets which are all diverse, leading to diverse
classifiers. The method is dubbed Ensemble Projection (EP). EP captures not
only the characteristics of individual images, but also the relationships among
images. It is conceptually simple and computationally efficient, yet effective
and flexible. Experiments on eight standard datasets show that: (1) EP
outperforms previous methods for semi-supervised image classification; (2) EP
produces promising results for self-taught image classification, where
unlabeled samples are a random collection of images rather than being from the
same distribution as the labeled ones; and (3) EP improves over the original
features for image clustering. The code of the method is available on the
project page.Comment: 22 pages, 8 figure
Robust Multiple Manifolds Structure Learning
We present a robust multiple manifolds structure learning (RMMSL) scheme to
robustly estimate data structures under the multiple low intrinsic dimensional
manifolds assumption. In the local learning stage, RMMSL efficiently estimates
local tangent space by weighted low-rank matrix factorization. In the global
learning stage, we propose a robust manifold clustering method based on local
structure learning results. The proposed clustering method is designed to get
the flattest manifolds clusters by introducing a novel curved-level similarity
function. Our approach is evaluated and compared to state-of-the-art methods on
synthetic data, handwritten digit images, human motion capture data and
motorbike videos. We demonstrate the effectiveness of the proposed approach,
which yields higher clustering accuracy, and produces promising results for
challenging tasks of human motion segmentation and motion flow learning from
videos.Comment: ICML201
Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization
Image clustering is one of the most important computer vision applications,
which has been extensively studied in literature. However, current clustering
methods mostly suffer from lack of efficiency and scalability when dealing with
large-scale and high-dimensional data. In this paper, we propose a new
clustering model, called DEeP Embedded RegularIzed ClusTering (DEPICT), which
efficiently maps data into a discriminative embedding subspace and precisely
predicts cluster assignments. DEPICT generally consists of a multinomial
logistic regression function stacked on top of a multi-layer convolutional
autoencoder. We define a clustering objective function using relative entropy
(KL divergence) minimization, regularized by a prior for the frequency of
cluster assignments. An alternating strategy is then derived to optimize the
objective by updating parameters and estimating cluster assignments.
Furthermore, we employ the reconstruction loss functions in our autoencoder, as
a data-dependent regularization term, to prevent the deep embedding function
from overfitting. In order to benefit from end-to-end optimization and
eliminate the necessity for layer-wise pretraining, we introduce a joint
learning framework to minimize the unified clustering and reconstruction loss
functions together and train all network layers simultaneously. Experimental
results indicate the superiority and faster running time of DEPICT in
real-world clustering tasks, where no labeled data is available for
hyper-parameter tuning
Local Regularization of Noisy Point Clouds: Improved Global Geometric Estimates and Data Analysis
Several data analysis techniques employ similarity relationships between data
points to uncover the intrinsic dimension and geometric structure of the
underlying data-generating mechanism. In this paper we work under the model
assumption that the data is made of random perturbations of feature vectors
lying on a low-dimensional manifold. We study two questions: how to define the
similarity relationship over noisy data points, and what is the resulting
impact of the choice of similarity in the extraction of global geometric
information from the underlying manifold. We provide concrete mathematical
evidence that using a local regularization of the noisy data to define the
similarity improves the approximation of the hidden Euclidean distance between
unperturbed points. Furthermore, graph-based objects constructed with the
locally regularized similarity function satisfy better error bounds in their
recovery of global geometric ones. Our theory is supported by numerical
experiments that demonstrate that the gain in geometric understanding
facilitated by local regularization translates into a gain in classification
accuracy in simulated and real data
Clustering with Similarity Preserving
Graph-based clustering has shown promising performance in many tasks. A key
step of graph-based approach is the similarity graph construction. In general,
learning graph in kernel space can enhance clustering accuracy due to the
incorporation of nonlinearity. However, most existing kernel-based graph
learning mechanisms is not similarity-preserving, hence leads to sub-optimal
performance. To overcome this drawback, we propose a more discriminative graph
learning method which can preserve the pairwise similarities between samples in
an adaptive manner for the first time. Specifically, we require the learned
graph be close to a kernel matrix, which serves as a measure of similarity in
raw data. Moreover, the structure is adaptively tuned so that the number of
connected components of the graph is exactly equal to the number of clusters.
Finally, our method unifies clustering and graph learning which can directly
obtain cluster indicators from the graph itself without performing further
clustering step. The effectiveness of this approach is examined on both single
and multiple kernel learning scenarios in several datasets
Information-Maximization Clustering based on Squared-Loss Mutual Information
Information-maximization clustering learns a probabilistic classifier in an
unsupervised manner so that mutual information between feature vectors and
cluster assignments is maximized. A notable advantage of this approach is that
it only involves continuous optimization of model parameters, which is
substantially easier to solve than discrete optimization of cluster
assignments. However, existing methods still involve non-convex optimization
problems, and therefore finding a good local optimal solution is not
straightforward in practice. In this paper, we propose an alternative
information-maximization clustering method based on a squared-loss variant of
mutual information. This novel approach gives a clustering solution
analytically in a computationally efficient way via kernel eigenvalue
decomposition. Furthermore, we provide a practical model selection procedure
that allows us to objectively optimize tuning parameters included in the kernel
function. Through experiments, we demonstrate the usefulness of the proposed
approach
Effectiveness of self-supervised pre-training for speech recognition
We compare self-supervised representation learning algorithms which either
explicitly quantize the audio data or learn representations without
quantization. We find the former to be more accurate since it builds a good
vocabulary of the data through vq-wav2vec [1] to enable learning of effective
representations in subsequent BERT training. Different to previous work, we
directly fine-tune the pre-trained BERT models on transcribed speech using a
Connectionist Temporal Classification (CTC) loss instead of feeding the
representations into a task-specific model. We also propose a BERT-style model
learning directly from the continuous audio data and compare pre-training on
raw audio to spectral features. Fine-tuning a BERT model on 10 hour of labeled
Librispeech data with a vq-wav2vec vocabulary is almost as good as the best
known reported system trained on 100 hours of labeled data on testclean, while
achieving a 25% WER reduction on test-other. When using only 10 minutes of
labeled data, WER is 25.2 on test-other and 16.3 on test-clean. This
demonstrates that self-supervision can enable speech recognition systems
trained on a near-zero amount of transcribed data
High-Fidelity Image Generation With Fewer Labels
Deep generative models are becoming a cornerstone of modern machine learning.
Recent work on conditional generative adversarial networks has shown that
learning complex, high-dimensional distributions over natural images is within
reach. While the latest models are able to generate high-fidelity, diverse
natural images at high resolution, they rely on a vast quantity of labeled
data. In this work we demonstrate how one can benefit from recent work on self-
and semi-supervised learning to outperform the state of the art on both
unsupervised ImageNet synthesis, as well as in the conditional setting. In
particular, the proposed approach is able to match the sample quality (as
measured by FID) of the current state-of-the-art conditional model BigGAN on
ImageNet using only 10% of the labels and outperform it using 20% of the
labels.Comment: Mario Lucic, Michael Tschannen, and Marvin Ritter contributed equally
to this work. ICML 2019 camera-ready version. Code available at
https://github.com/google/compare_ga
Learning Discrete Representations via Information Maximizing Self-Augmented Training
Learning discrete representations of data is a central machine learning task
because of the compactness of the representations and ease of interpretation.
The task includes clustering and hash learning as special cases. Deep neural
networks are promising to be used because they can model the non-linearity of
data and scale to large datasets. However, their model complexity is huge, and
therefore, we need to carefully regularize the networks in order to learn
useful representations that exhibit intended invariance for applications of
interest. To this end, we propose a method called Information Maximizing
Self-Augmented Training (IMSAT). In IMSAT, we use data augmentation to impose
the invariance on discrete representations. More specifically, we encourage the
predicted representations of augmented data points to be close to those of the
original data points in an end-to-end fashion. At the same time, we maximize
the information-theoretic dependency between data and their predicted discrete
representations. Extensive experiments on benchmark datasets show that IMSAT
produces state-of-the-art results for both clustering and unsupervised hash
learning.Comment: To appear at ICML 201
Graph Clustering with Dynamic Embedding
Graph clustering (or community detection) has long drawn enormous attention
from the research on web mining and information networks. Recent literature on
this topic has reached a consensus that node contents and link structures
should be integrated for reliable graph clustering, especially in an
unsupervised setting. However, existing methods based on shallow models often
suffer from content noise and sparsity. In this work, we propose to utilize
deep embedding for graph clustering, motivated by the well-recognized power of
neural networks in learning intrinsic content representations. Upon that, we
capture the dynamic nature of networks through the principle of influence
propagation and calculate the dynamic network embedding. Network clusters are
then detected based on the stable state of such an embedding. Unlike most
existing embedding methods that are task-agnostic, we simultaneously solve for
the underlying node representations and the optimal clustering assignments in
an end-to-end manner. To provide more insight, we theoretically analyze our
interpretation of network clusters and find its underlying connections with two
widely applied approaches for network modeling. Extensive experimental results
on six real-world datasets including both social networks and citation networks
demonstrate the superiority of our proposed model over the state-of-the-art
- …