19,032 research outputs found
Autoencoder Node Saliency: Selecting Relevant Latent Representations
The autoencoder is an artificial neural network model that learns hidden
representations of unlabeled data. With a linear transfer function it is
similar to the principal component analysis (PCA). While both methods use
weight vectors for linear transformations, the autoencoder does not come with
any indication similar to the eigenvalues in PCA that are paired with the
eigenvectors. We propose a novel supervised node saliency (SNS) method that
ranks the hidden nodes by comparing class distributions of latent
representations against a fixed reference distribution. The latent
representations of a hidden node can be described using a one-dimensional
histogram. We apply normalized entropy difference (NED) to measure the
"interestingness" of the histograms, and conclude a property for NED values to
identify a good classifying node. By applying our methods to real data sets, we
demonstrate the ability of SNS to explain what the trained autoencoders have
learned
Unsupervised Creation of Parameterized Avatars
We study the problem of mapping an input image to a tied pair consisting of a
vector of parameters and an image that is created using a graphical engine from
the vector of parameters. The mapping's objective is to have the output image
as similar as possible to the input image. During training, no supervision is
given in the form of matching inputs and outputs.
This learning problem extends two literature problems: unsupervised domain
adaptation and cross domain transfer. We define a generalization bound that is
based on discrepancy, and employ a GAN to implement a network solution that
corresponds to this bound. Experimentally, our method is shown to solve the
problem of automatically creating avatars.Comment: v2 -- a change in the references due to a request from author
Kernels and Submodels of Deep Belief Networks
We study the mixtures of factorizing probability distributions represented as
visible marginal distributions in stochastic layered networks. We take the
perspective of kernel transitions of distributions, which gives a unified
picture of distributed representations arising from Deep Belief Networks (DBN)
and other networks without lateral connections. We describe combinatorial and
geometric properties of the set of kernels and products of kernels realizable
by DBNs as the network parameters vary. We describe explicit classes of
probability distributions, including exponential families, that can be learned
by DBNs. We use these submodels to bound the maximal and the expected
Kullback-Leibler approximation errors of DBNs from above depending on the
number of hidden layers and units that they contain.Comment: 13 pages, 4 figures, deep learning and unsupervised feature learning
nips workshop 201
A Deep Learning Approach to Unsupervised Ensemble Learning
We show how deep learning methods can be applied in the context of
crowdsourcing and unsupervised ensemble learning. First, we prove that the
popular model of Dawid and Skene, which assumes that all classifiers are
conditionally independent, is {\em equivalent} to a Restricted Boltzmann
Machine (RBM) with a single hidden node. Hence, under this model, the posterior
probabilities of the true labels can be instead estimated via a trained RBM.
Next, to address the more general case, where classifiers may strongly violate
the conditional independence assumption, we propose to apply RBM-based Deep
Neural Net (DNN). Experimental results on various simulated and real-world
datasets demonstrate that our proposed DNN approach outperforms other
state-of-the-art methods, in particular when the data violates the conditional
independence assumption
Tiny Descriptors for Image Retrieval with Unsupervised Triplet Hashing
A typical image retrieval pipeline starts with the comparison of global
descriptors from a large database to find a short list of candidate matches. A
good image descriptor is key to the retrieval pipeline and should reconcile two
contradictory requirements: providing recall rates as high as possible and
being as compact as possible for fast matching. Following the recent successes
of Deep Convolutional Neural Networks (DCNN) for large scale image
classification, descriptors extracted from DCNNs are increasingly used in place
of the traditional hand crafted descriptors such as Fisher Vectors (FV) with
better retrieval performances. Nevertheless, the dimensionality of a typical
DCNN descriptor --extracted either from the visual feature pyramid or the
fully-connected layers-- remains quite high at several thousands of scalar
values. In this paper, we propose Unsupervised Triplet Hashing (UTH), a fully
unsupervised method to compute extremely compact binary hashes --in the 32-256
bits range-- from high-dimensional global descriptors. UTH consists of two
successive deep learning steps. First, Stacked Restricted Boltzmann Machines
(SRBM), a type of unsupervised deep neural nets, are used to learn binary
embedding functions able to bring the descriptor size down to the desired
bitrate. SRBMs are typically able to ensure a very high compression rate at the
expense of loosing some desirable metric properties of the original DCNN
descriptor space. Then, triplet networks, a rank learning scheme based on
weight sharing nets is used to fine-tune the binary embedding functions to
retain as much as possible of the useful metric properties of the original
space. A thorough empirical evaluation conducted on multiple publicly available
dataset using DCNN descriptors shows that our method is able to significantly
outperform state-of-the-art unsupervised schemes in the target bit range
Detection of Unknown Anomalies in Streaming Videos with Generative Energy-based Boltzmann Models
Abnormal event detection is one of the important objectives in research and
practical applications of video surveillance. However, there are still three
challenging problems for most anomaly detection systems in practical setting:
limited labeled data, ambiguous definition of "abnormal" and expensive feature
engineering steps. This paper introduces a unified detection framework to
handle these challenges using energy-based models, which are powerful tools for
unsupervised representation learning. Our proposed models are firstly trained
on unlabeled raw pixels of image frames from an input video rather than
hand-crafted visual features; and then identify the locations of abnormal
objects based on the errors between the input video and its reconstruction
produced by the models. To handle video stream, we develop an online version of
our framework, wherein the model parameters are updated incrementally with the
image frames arriving on the fly. Our experiments show that our detectors,
using Restricted Boltzmann Machines (RBMs) and Deep Boltzmann Machines (DBMs)
as core modules, achieve superior anomaly detection performance to unsupervised
baselines and obtain accuracy comparable with the state-of-the-art approaches
when evaluating at the pixel-level. More importantly, we discover that our
system trained with DBMs is able to simultaneously perform scene clustering and
scene reconstruction. This capacity not only distinguishes our method from
other existing detectors but also offers a unique tool to investigate and
understand how the model works.Comment: This manuscript is under consideration at Pattern Recognition Letter
DeepSat - A Learning framework for Satellite Imagery
Satellite image classification is a challenging problem that lies at the
crossroads of remote sensing, computer vision, and machine learning. Due to the
high variability inherent in satellite data, most of the current object
classification approaches are not suitable for handling satellite datasets. The
progress of satellite image analytics has also been inhibited by the lack of a
single labeled high-resolution dataset with multiple class labels. The
contributions of this paper are twofold - (1) first, we present two new
satellite datasets called SAT-4 and SAT-6, and (2) then, we propose a
classification framework that extracts features from an input image, normalizes
them and feeds the normalized feature vectors to a Deep Belief Network for
classification. On the SAT-4 dataset, our best network produces a
classification accuracy of 97.95% and outperforms three state-of-the-art object
recognition algorithms, namely - Deep Belief Networks, Convolutional Neural
Networks and Stacked Denoising Autoencoders by ~11%. On SAT-6, it produces a
classification accuracy of 93.9% and outperforms the other algorithms by ~15%.
Comparative studies with a Random Forest classifier show the advantage of an
unsupervised learning approach over traditional supervised learning techniques.
A statistical analysis based on Distribution Separability Criterion and
Intrinsic Dimensionality Estimation substantiates the effectiveness of our
approach in learning better representations for satellite imagery.Comment: Paper was accepted at ACM SIGSPATIAL 201
Energy-based Models for Video Anomaly Detection
Automated detection of abnormalities in data has been studied in research
area in recent years because of its diverse applications in practice including
video surveillance, industrial damage detection and network intrusion
detection. However, building an effective anomaly detection system is a
non-trivial task since it requires to tackle challenging issues of the shortage
of annotated data, inability of defining anomaly objects explicitly and the
expensive cost of feature engineering procedure. Unlike existing appoaches
which only partially solve these problems, we develop a unique framework to
cope the problems above simultaneously. Instead of hanlding with ambiguous
definition of anomaly objects, we propose to work with regular patterns whose
unlabeled data is abundant and usually easy to collect in practice. This allows
our system to be trained completely in an unsupervised procedure and liberate
us from the need for costly data annotation. By learning generative model that
capture the normality distribution in data, we can isolate abnormal data points
that result in low normality scores (high abnormality scores). Moreover, by
leverage on the power of generative networks, i.e. energy-based models, we are
also able to learn the feature representation automatically rather than
replying on hand-crafted features that have been dominating anomaly detection
research over many decades. We demonstrate our proposal on the specific
application of video anomaly detection and the experimental results indicate
that our method performs better than baselines and are comparable with
state-of-the-art methods in many benchmark video anomaly detection datasets
Learning Robust Visual-Semantic Embeddings
Many of the existing methods for learning joint embedding of images and text
use only supervised information from paired images and its textual attributes.
Taking advantage of the recent success of unsupervised learning in deep neural
networks, we propose an end-to-end learning framework that is able to extract
more robust multi-modal representations across domains. The proposed method
combines representation learning models (i.e., auto-encoders) together with
cross-domain learning criteria (i.e., Maximum Mean Discrepancy loss) to learn
joint embeddings for semantic and visual features. A novel technique of
unsupervised-data adaptation inference is introduced to construct more
comprehensive embeddings for both labeled and unlabeled data. We evaluate our
method on Animals with Attributes and Caltech-UCSD Birds 200-2011 dataset with
a wide range of applications, including zero and few-shot image recognition and
retrieval, from inductive to transductive settings. Empirically, we show that
our framework improves over the current state of the art on many of the
considered tasks.Comment: 12 page
Gamma Belief Networks
To infer multilayer deep representations of high-dimensional discrete and
nonnegative real vectors, we propose an augmentable gamma belief network (GBN)
that factorizes each of its hidden layers into the product of a sparse
connection weight matrix and the nonnegative real hidden units of the next
layer. The GBN's hidden layers are jointly trained with an upward-downward
Gibbs sampler that solves each layer with the same subroutine. The
gamma-negative binomial process combined with a layer-wise training strategy
allows inferring the width of each layer given a fixed budget on the width of
the first layer. Example results illustrate interesting relationships between
the width of the first layer and the inferred network structure, and
demonstrate that the GBN can add more layers to improve its performance in both
unsupervisedly extracting features and predicting heldout data. For exploratory
data analysis, we extract trees and subnetworks from the learned deep network
to visualize how the very specific factors discovered at the first hidden layer
and the increasingly more general factors discovered at deeper hidden layers
are related to each other, and we generate synthetic data by propagating random
variables through the deep network from the top hidden layer back to the bottom
data layer.Comment: 44 pages, 24 figure
- …