36 research outputs found
Sacrificing Accuracy for Reduced Computation: Cascaded Inference Based on Softmax Confidence
We study the tradeoff between computational effort and accuracy in a cascade
of deep neural networks. During inference, early termination in the cascade is
controlled by confidence levels derived directly from the softmax outputs of
intermediate classifiers. The advantage of early termination is that
classification is performed using less computation, thus adjusting the
computational effort to the complexity of the input. Moreover, dynamic
modification of confidence thresholds allow one to trade accuracy for
computational effort without requiring retraining. Basing of early termination
on softmax classifier outputs is justified by experimentation that demonstrates
an almost linear relation between confidence levels in intermediate classifiers
and accuracy. Our experimentation with architectures based on ResNet obtained
the following results. (i) A speedup of 1.5 that sacrifices 1.4% accuracy with
respect to the CIFAR-10 test set. (ii) A speedup of 1.19 that sacrifices 0.7%
accuracy with respect to the CIFAR-100 test set. (iii) A speedup of 2.16 that
sacrifices 1.4% accuracy with respect to the SVHN test set
Multigrid Backprojection Super-Resolution and Deep Filter Visualization
We introduce a novel deep-learning architecture for image upscaling by large
factors (e.g. 4x, 8x) based on examples of pristine high-resolution images. Our
target is to reconstruct high-resolution images from their downscale versions.
The proposed system performs a multi-level progressive upscaling, starting from
small factors (2x) and updating for higher factors (4x and 8x). The system is
recursive as it repeats the same procedure at each level. It is also residual
since we use the network to update the outputs of a classic upscaler. The
network residuals are improved by Iterative Back-Projections (IBP) computed in
the features of a convolutional network. To work in multiple levels we extend
the standard back-projection algorithm using a recursion analogous to
Multi-Grid algorithms commonly used as solvers of large systems of linear
equations. We finally show how the network can be interpreted as a standard
upsampling-and-filter upscaler with a space-variant filter that adapts to the
geometry. This approach allows us to visualize how the network learns to
upscale. Finally, our system reaches state of the art quality for models with
relatively few number of parameters.Comment: Spotlight paper in the Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Anytime Stereo Image Depth Estimation on Mobile Devices
Many applications of stereo depth estimation in robotics require the
generation of accurate disparity maps in real time under significant
computational constraints. Current state-of-the-art algorithms force a choice
between either generating accurate mappings at a slow pace, or quickly
generating inaccurate ones, and additionally these methods typically require
far too many parameters to be usable on power- or memory-constrained devices.
Motivated by these shortcomings, we propose a novel approach for disparity
prediction in the anytime setting. In contrast to prior work, our end-to-end
learned approach can trade off computation and accuracy at inference time.
Depth estimation is performed in stages, during which the model can be queried
at any time to output its current best estimate. Our final model can process
1242375 resolution images within a range of 10-35 FPS on an NVIDIA
Jetson TX2 module with only marginal increases in error -- using two orders of
magnitude fewer parameters than the most competitive baseline. The source code
is available at https://github.com/mileyan/AnyNet .Comment: Accepted by ICRA201
Learning Fully Dense Neural Networks for Image Semantic Segmentation
Semantic segmentation is pixel-wise classification which retains critical
spatial information. The "feature map reuse" has been commonly adopted in CNN
based approaches to take advantage of feature maps in the early layers for the
later spatial reconstruction. Along this direction, we go a step further by
proposing a fully dense neural network with an encoder-decoder structure that
we abbreviate as FDNet. For each stage in the decoder module, feature maps of
all the previous blocks are adaptively aggregated to feed-forward as input. On
the one hand, it reconstructs the spatial boundaries accurately. On the other
hand, it learns more efficiently with the more efficient gradient
backpropagation. In addition, we propose the boundary-aware loss function to
focus more attention on the pixels near the boundary, which boosts the "hard
examples" labeling. We have demonstrated the best performance of the FDNet on
the two benchmark datasets: PASCAL VOC 2012, NYUDv2 over previous works when
not considering training on other datasets