60,276 research outputs found
Building high-level features using large scale unsupervised learning
We consider the problem of building high-level, class-specific feature
detectors from only unlabeled data. For example, is it possible to learn a face
detector using only unlabeled images? To answer this, we train a 9-layered
locally connected sparse autoencoder with pooling and local contrast
normalization on a large dataset of images (the model has 1 billion
connections, the dataset has 10 million 200x200 pixel images downloaded from
the Internet). We train this network using model parallelism and asynchronous
SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to
what appears to be a widely-held intuition, our experimental results reveal
that it is possible to train a face detector without having to label images as
containing a face or not. Control experiments show that this feature detector
is robust not only to translation but also to scaling and out-of-plane
rotation. We also find that the same network is sensitive to other high-level
concepts such as cat faces and human bodies. Starting with these learned
features, we trained our network to obtain 15.8% accuracy in recognizing 20,000
object categories from ImageNet, a leap of 70% relative improvement over the
previous state-of-the-art
Unsupervised feature learning for electrocardiogram data using the convolutional variational autoencoder
Most existing electrocardiogram (ECG) feature extraction methods rely on rule-based approaches. It is difficult to manually define all ECG features. We propose an unsupervised feature learning method using a convolutional variational autoencoder (CVAE) that can extract ECG features with unlabeled data. We used 596,000 ECG samples from 1,278 patients archived in biosignal databases from intensive care units to train the CVAE. Three external datasets were used for feature validation using two approaches. First, we explored the features without an additional training process. Clustering, latent space exploration, and anomaly detection were conducted. We confirmed that CVAE features reflected the various types of ECG rhythms. Second, we applied CVAE features to new tasks as input data and CVAE weights to weight initialization for different models for transfer learning for the classification of 12 types of arrhythmias. The f1-score for arrhythmia classification with extreme gradient boosting was 0.86 using CVAE features only. The f1-score of the model in which weights were initialized with the CVAE encoder was 5% better than that obtained with random initialization. Unsupervised feature learning with CVAE can extract the characteristics of various types of ECGs and can be an alternative to the feature extraction method for ECGs.ope
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation
Unlike unsupervised approaches such as autoencoders that learn to reconstruct
their inputs, this paper introduces an alternative approach to unsupervised
feature learning called divergent discriminative feature accumulation (DDFA)
that instead continually accumulates features that make novel discriminations
among the training set. Thus DDFA features are inherently discriminative from
the start even though they are trained without knowledge of the ultimate
classification problem. Interestingly, DDFA also continues to add new features
indefinitely (so it does not depend on a hidden layer size), is not based on
minimizing error, and is inherently divergent instead of convergent, thereby
providing a unique direction of research for unsupervised feature learning. In
this paper the quality of its learned features is demonstrated on the MNIST
dataset, where its performance confirms that indeed DDFA is a viable technique
for learning useful features.Comment: Corrected citation formattin
Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision
Feature selection is essential for effective visual recognition. We propose
an efficient joint classifier learning and feature selection method that
discovers sparse, compact representations of input features from a vast sea of
candidates, with an almost unsupervised formulation. Our method requires only
the following knowledge, which we call the \emph{feature sign}---whether or not
a particular feature has on average stronger values over positive samples than
over negatives. We show how this can be estimated using as few as a single
labeled training sample per class. Then, using these feature signs, we extend
an initial supervised learning problem into an (almost) unsupervised clustering
formulation that can incorporate new data without requiring ground truth
labels. Our method works both as a feature selection mechanism and as a fully
competitive classifier. It has important properties, low computational cost and
excellent accuracy, especially in difficult cases of very limited training
data. We experiment on large-scale recognition in video and show superior speed
and performance to established feature selection approaches such as AdaBoost,
Lasso, greedy forward-backward selection, and powerful classifiers such as SVM.Comment: arXiv admin note: text overlap with arXiv:1411.771
Sparse Coding on Stereo Video for Object Detection
Deep Convolutional Neural Networks (DCNN) require millions of labeled
training examples for image classification and object detection tasks, which
restrict these models to domains where such datasets are available. In this
paper, we explore the use of unsupervised sparse coding applied to stereo-video
data to help alleviate the need for large amounts of labeled data. We show that
replacing a typical supervised convolutional layer with an unsupervised
sparse-coding layer within a DCNN allows for better performance on a car
detection task when only a limited number of labeled training examples is
available. Furthermore, the network that incorporates sparse coding allows for
more consistent performance over varying initializations and ordering of
training examples when compared to a fully supervised DCNN. Finally, we compare
activations between the unsupervised sparse-coding layer and the supervised
convolutional layer, and show that the sparse representation exhibits an
encoding that is depth selective, whereas encodings from the convolutional
layer do not exhibit such selectivity. These result indicates promise for using
unsupervised sparse-coding approaches in real-world computer vision tasks in
domains with limited labeled training data
Unsupervised Learning of Visual Structure using Predictive Generative Networks
The ability to predict future states of the environment is a central pillar
of intelligence. At its core, effective prediction requires an internal model
of the world and an understanding of the rules by which the world changes.
Here, we explore the internal models developed by deep neural networks trained
using a loss based on predicting future frames in synthetic video sequences,
using a CNN-LSTM-deCNN framework. We first show that this architecture can
achieve excellent performance in visual sequence prediction tasks, including
state-of-the-art performance in a standard 'bouncing balls' dataset (Sutskever
et al., 2009). Using a weighted mean-squared error and adversarial loss
(Goodfellow et al., 2014), the same architecture successfully extrapolates
out-of-the-plane rotations of computer-generated faces. Furthermore, despite
being trained end-to-end to predict only pixel-level information, our
Predictive Generative Networks learn a representation of the latent structure
of the underlying three-dimensional objects themselves. Importantly, we find
that this representation is naturally tolerant to object transformations, and
generalizes well to new tasks, such as classification of static images. Similar
models trained solely with a reconstruction loss fail to generalize as
effectively. We argue that prediction can serve as a powerful unsupervised loss
for learning rich internal representations of high-level object features.Comment: under review as conference paper at ICLR 201
- โฆ