1,495 research outputs found
Regularizing Deep Networks by Modeling and Predicting Label Structure
We construct custom regularization functions for use in supervised training
of deep neural networks. Our technique is applicable when the ground-truth
labels themselves exhibit internal structure; we derive a regularizer by
learning an autoencoder over the set of annotations. Training thereby becomes a
two-phase procedure. The first phase models labels with an autoencoder. The
second phase trains the actual network of interest by attaching an auxiliary
branch that must predict output via a hidden layer of the autoencoder. After
training, we discard this auxiliary branch.
We experiment in the context of semantic segmentation, demonstrating this
regularization strategy leads to consistent accuracy boosts over baselines,
both when training from scratch, or in combination with ImageNet pretraining.
Gains are also consistent over different choices of convolutional network
architecture. As our regularizer is discarded after training, our method has
zero cost at test time; the performance improvements are essentially free. We
are simply able to learn better network weights by building an abstract model
of the label space, and then training the network to understand this
abstraction alongside the original task.Comment: to appear at CVPR 201
Colorization as a Proxy Task for Visual Understanding
We investigate and improve self-supervision as a drop-in replacement for
ImageNet pretraining, focusing on automatic colorization as the proxy task.
Self-supervised training has been shown to be more promising for utilizing
unlabeled data than other, traditional unsupervised learning methods. We build
on this success and evaluate the ability of our self-supervised network in
several contexts. On VOC segmentation and classification tasks, we present
results that are state-of-the-art among methods not using ImageNet labels for
pretraining representations.
Moreover, we present the first in-depth analysis of self-supervision via
colorization, concluding that formulation of the loss, training details and
network architecture play important roles in its effectiveness. This
investigation is further expanded by revisiting the ImageNet pretraining
paradigm, asking questions such as: How much training data is needed? How many
labels are needed? How much do features change when fine-tuned? We relate these
questions back to self-supervision by showing that colorization provides a
similarly powerful supervisory signal as various flavors of ImageNet
pretraining.Comment: CVPR 2017 (Project page:
http://people.cs.uchicago.edu/~larsson/color-proxy/
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling
We frame the task of predicting a semantic labeling as a sparse
reconstruction procedure that applies a target-specific learned transfer
function to a generic deep sparse code representation of an image. This
strategy partitions training into two distinct stages. First, in an
unsupervised manner, we learn a set of generic dictionaries optimized for
sparse coding of image patches. We train a multilayer representation via
recursive sparse dictionary learning on pooled codes output by earlier layers.
Second, we encode all training images with the generic dictionaries and learn a
transfer function that optimizes reconstruction of patches extracted from
annotated ground-truth given the sparse codes of their corresponding image
patches. At test time, we encode a novel image using the generic dictionaries
and then reconstruct using the transfer function. The output reconstruction is
a semantic labeling of the test image.
Applying this strategy to the task of contour detection, we demonstrate
performance competitive with state-of-the-art systems. Unlike almost all prior
work, our approach obviates the need for any form of hand-designed features or
filters. To illustrate general applicability, we also show initial results on
semantic part labeling of human faces.
The effectiveness of our approach opens new avenues for research on deep
sparse representations. Our classifiers utilize this representation in a novel
manner. Rather than acting on nodes in the deepest layer, they attach to nodes
along a slice through multiple layers of the network in order to make
predictions about local patches. Our flexible combination of a generatively
learned sparse representation with discriminatively trained transfer
classifiers extends the notion of sparse reconstruction to encompass arbitrary
semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201
Sparsely Aggregated Convolutional Networks
We explore a key architectural aspect of deep convolutional neural networks:
the pattern of internal skip connections used to aggregate outputs of earlier
layers for consumption by deeper layers. Such aggregation is critical to
facilitate training of very deep networks in an end-to-end manner. This is a
primary reason for the widespread adoption of residual networks, which
aggregate outputs via cumulative summation. While subsequent works investigate
alternative aggregation operations (e.g. concatenation), we focus on an
orthogonal question: which outputs to aggregate at a particular point in the
network. We propose a new internal connection structure which aggregates only a
sparse set of previous outputs at any given depth. Our experiments demonstrate
this simple design change offers superior performance with fewer parameters and
lower computational requirements. Moreover, we show that sparse aggregation
allows networks to scale more robustly to 1000+ layers, thereby opening future
avenues for training long-running visual processes.Comment: Accepted to ECCV 201
- …