5,254 research outputs found
Dynamic Steerable Blocks in Deep Residual Networks
Filters in convolutional networks are typically parameterized in a pixel
basis, that does not take prior knowledge about the visual world into account.
We investigate the generalized notion of frames designed with image properties
in mind, as alternatives to this parametrization. We show that frame-based
ResNets and Densenets can improve performance on Cifar-10+ consistently, while
having additional pleasant properties like steerability. By exploiting these
transformation properties explicitly, we arrive at dynamic steerable blocks.
They are an extension of residual blocks, that are able to seamlessly transform
filters under pre-defined transformations, conditioned on the input at training
and inference time. Dynamic steerable blocks learn the degree of invariance
from data and locally adapt filters, allowing them to apply a different
geometrical variant of the same filter to each location of the feature map.
When evaluated on the Berkeley Segmentation contour detection dataset, our
approach outperforms all competing approaches that do not utilize pre-training.
Our results highlight the benefits of image-based regularization to deep
networks
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling
We frame the task of predicting a semantic labeling as a sparse
reconstruction procedure that applies a target-specific learned transfer
function to a generic deep sparse code representation of an image. This
strategy partitions training into two distinct stages. First, in an
unsupervised manner, we learn a set of generic dictionaries optimized for
sparse coding of image patches. We train a multilayer representation via
recursive sparse dictionary learning on pooled codes output by earlier layers.
Second, we encode all training images with the generic dictionaries and learn a
transfer function that optimizes reconstruction of patches extracted from
annotated ground-truth given the sparse codes of their corresponding image
patches. At test time, we encode a novel image using the generic dictionaries
and then reconstruct using the transfer function. The output reconstruction is
a semantic labeling of the test image.
Applying this strategy to the task of contour detection, we demonstrate
performance competitive with state-of-the-art systems. Unlike almost all prior
work, our approach obviates the need for any form of hand-designed features or
filters. To illustrate general applicability, we also show initial results on
semantic part labeling of human faces.
The effectiveness of our approach opens new avenues for research on deep
sparse representations. Our classifiers utilize this representation in a novel
manner. Rather than acting on nodes in the deepest layer, they attach to nodes
along a slice through multiple layers of the network in order to make
predictions about local patches. Our flexible combination of a generatively
learned sparse representation with discriminatively trained transfer
classifiers extends the notion of sparse reconstruction to encompass arbitrary
semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201
DCTNet : A Simple Learning-free Approach for Face Recognition
PCANet was proposed as a lightweight deep learning network that mainly
leverages Principal Component Analysis (PCA) to learn multistage filter banks
followed by binarization and block-wise histograming. PCANet was shown worked
surprisingly well in various image classification tasks. However, PCANet is
data-dependence hence inflexible. In this paper, we proposed a
data-independence network, dubbed DCTNet for face recognition in which we adopt
Discrete Cosine Transform (DCT) as filter banks in place of PCA. This is
motivated by the fact that 2D DCT basis is indeed a good approximation for high
ranked eigenvectors of PCA. Both 2D DCT and PCA resemble a kind of modulated
sine-wave patterns, which can be perceived as a bandpass filter bank. DCTNet is
free from learning as 2D DCT bases can be computed in advance. Besides that, we
also proposed an effective method to regulate the block-wise histogram feature
vector of DCTNet for robustness. It is shown to provide surprising performance
boost when the probe image is considerably different in appearance from the
gallery image. We evaluate the performance of DCTNet extensively on a number of
benchmark face databases and being able to achieve on par with or often better
accuracy performance than PCANet.Comment: APSIPA ASC 201
A Family of Maximum Margin Criterion for Adaptive Learning
In recent years, pattern analysis plays an important role in data mining and
recognition, and many variants have been proposed to handle complicated
scenarios. In the literature, it has been quite familiar with high
dimensionality of data samples, but either such characteristics or large data
have become usual sense in real-world applications. In this work, an improved
maximum margin criterion (MMC) method is introduced firstly. With the new
definition of MMC, several variants of MMC, including random MMC, layered MMC,
2D^2 MMC, are designed to make adaptive learning applicable. Particularly, the
MMC network is developed to learn deep features of images in light of simple
deep networks. Experimental results on a diversity of data sets demonstrate the
discriminant ability of proposed MMC methods are compenent to be adopted in
complicated application scenarios.Comment: 14 page
Programmable Spectrometry -- Per-pixel Classification of Materials using Learned Spectral Filters
Many materials have distinct spectral profiles. This facilitates estimation
of the material composition of a scene at each pixel by first acquiring its
hyperspectral image, and subsequently filtering it using a bank of spectral
profiles. This process is inherently wasteful since only a set of linear
projections of the acquired measurements contribute to the classification task.
We propose a novel programmable camera that is capable of producing images of a
scene with an arbitrary spectral filter. We use this camera to optically
implement the spectral filtering of the scene's hyperspectral image with the
bank of spectral profiles needed to perform per-pixel material classification.
This provides gains both in terms of acquisition speed --- since only the
relevant measurements are acquired --- and in signal-to-noise ratio --- since
we invariably avoid narrowband filters that are light inefficient. Given
training data, we use a range of classical and modern techniques including SVMs
and neural networks to identify the bank of spectral profiles that facilitate
material classification. We verify the method in simulations on standard
datasets as well as real data using a lab prototype of the camera
Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition
Facial micro-expression (ME) recognition has posed a huge challenge to
researchers for its subtlety in motion and limited databases. Recently,
handcrafted techniques have achieved superior performance in micro-expression
recognition but at the cost of domain specificity and cumbersome parametric
tunings. In this paper, we propose an Enriched Long-term Recurrent
Convolutional Network (ELRCN) that first encodes each micro-expression frame
into a feature vector through CNN module(s), then predicts the micro-expression
by passing the feature vector through a Long Short-term Memory (LSTM) module.
The framework contains two different network variants: (1) Channel-wise
stacking of input data for spatial enrichment, (2) Feature-wise stacking of
features for temporal enrichment. We demonstrate that the proposed approach is
able to achieve reasonably good performance, without data augmentation. In
addition, we also present ablation studies conducted on the framework and
visualizations of what CNN "sees" when predicting the micro-expression classes.Comment: Published in Micro-Expression Grand Challenge 2018, Workshop of 13th
IEEE Facial & Gesture 201
- …