9,223 research outputs found
Residual Attention Network for Image Classification
In this work, we propose "Residual Attention Network", a convolutional neural
network using attention mechanism which can incorporate with state-of-art feed
forward network architecture in an end-to-end training fashion. Our Residual
Attention Network is built by stacking Attention Modules which generate
attention-aware features. The attention-aware features from different modules
change adaptively as layers going deeper. Inside each Attention Module,
bottom-up top-down feedforward structure is used to unfold the feedforward and
feedback attention process into a single feedforward process. Importantly, we
propose attention residual learning to train very deep Residual Attention
Networks which can be easily scaled up to hundreds of layers. Extensive
analyses are conducted on CIFAR-10 and CIFAR-100 datasets to verify the
effectiveness of every module mentioned above. Our Residual Attention Network
achieves state-of-the-art object recognition performance on three benchmark
datasets including CIFAR-10 (3.90% error), CIFAR-100 (20.45% error) and
ImageNet (4.8% single model and single crop, top-5 error). Note that, our
method achieves 0.6% top-1 accuracy improvement with 46% trunk depth and 69%
forward FLOPs comparing to ResNet-200. The experiment also demonstrates that
our network is robust against noisy labels.Comment: accepted to CVPR201
Anchor Loss: Modulating Loss Scale Based on Prediction Difficulty
We propose a novel loss function that dynamically re-scales the cross entropy based on prediction difficulty regarding a sample. Deep neural network architectures in image classification tasks struggle to disambiguate visually similar objects. Likewise, in human pose estimation symmetric body parts often confuse the network with assigning indiscriminative scores to them. This is due to the output prediction, in which only the highest confidence label is selected without taking into consideration a measure of uncertainty. In this work, we define the prediction difficulty as a relative property coming from the confidence score gap between positive and negative labels. More precisely, the proposed loss function penalizes the network to avoid the score of a false prediction being significant. To demonstrate the efficacy of our loss function, we evaluate it on two different domains: image classification and human pose estimation. We find improvements in both applications by achieving higher accuracy compared to the baseline methods
Learning from Synthetic Humans
Estimating human pose, shape, and motion from images and videos are
fundamental challenges with many applications. Recent advances in 2D human pose
estimation use large amounts of manually-labeled training data for learning
convolutional neural networks (CNNs). Such data is time consuming to acquire
and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion
is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL
tasks): a new large-scale dataset with synthetically-generated but realistic
images of people rendered from 3D sequences of human motion capture data. We
generate more than 6 million frames together with ground truth pose, depth
maps, and segmentation masks. We show that CNNs trained on our synthetic
dataset allow for accurate human depth estimation and human part segmentation
in real RGB images. Our results and the new dataset open up new possibilities
for advancing person analysis using cheap and large-scale synthetic data.Comment: Appears in: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017). 9 page
- …