966 research outputs found
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
Depth sensing is a critical function for robotic tasks such as localization,
mapping and obstacle detection. There has been a significant and growing
interest in depth estimation from a single RGB image, due to the relatively low
cost and size of monocular cameras. However, state-of-the-art single-view depth
estimation algorithms are based on fairly complex deep neural networks that are
too slow for real-time inference on an embedded platform, for instance, mounted
on a micro aerial vehicle. In this paper, we address the problem of fast depth
estimation on embedded systems. We propose an efficient and lightweight
encoder-decoder network architecture and apply network pruning to further
reduce computational complexity and latency. In particular, we focus on the
design of a low-latency decoder. Our methodology demonstrates that it is
possible to achieve similar accuracy as prior work on depth estimation, but at
inference speeds that are an order of magnitude faster. Our proposed network,
FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using
only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves
close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of
the authors' knowledge, this paper demonstrates real-time monocular depth
estimation using a deep neural network with the lowest latency and highest
throughput on an embedded platform that can be carried by a micro aerial
vehicle.Comment: Accepted for presentation at ICRA 2019. 8 pages, 6 figures, 7 table
MobileNetV2: Inverted Residuals and Linear Bottlenecks
In this paper we describe a new mobile architecture, MobileNetV2, that
improves the state of the art performance of mobile models on multiple tasks
and benchmarks as well as across a spectrum of different model sizes. We also
describe efficient ways of applying these mobile models to object detection in
a novel framework we call SSDLite. Additionally, we demonstrate how to build
mobile semantic segmentation models through a reduced form of DeepLabv3 which
we call Mobile DeepLabv3.
The MobileNetV2 architecture is based on an inverted residual structure where
the input and output of the residual block are thin bottleneck layers opposite
to traditional residual models which use expanded representations in the input
an MobileNetV2 uses lightweight depthwise convolutions to filter features in
the intermediate expansion layer. Additionally, we find that it is important to
remove non-linearities in the narrow layers in order to maintain
representational power. We demonstrate that this improves performance and
provide an intuition that led to this design. Finally, our approach allows
decoupling of the input/output domains from the expressiveness of the
transformation, which provides a convenient framework for further analysis. We
measure our performance on Imagenet classification, COCO object detection, VOC
image segmentation. We evaluate the trade-offs between accuracy, and number of
operations measured by multiply-adds (MAdd), as well as the number of
parameter
clcNet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions
Depthwise convolution and grouped convolution has been successfully applied
to improve the efficiency of convolutional neural network (CNN). We suggest
that these models can be considered as special cases of a generalized
convolution operation, named channel local convolution(CLC), where an output
channel is computed using a subset of the input channels. This definition
entails computation dependency relations between input and output channels,
which can be represented by a channel dependency graph(CDG). By modifying the
CDG of grouped convolution, a new CLC kernel named interlaced grouped
convolution (IGC) is created. Stacking IGC and GC kernels results in a
convolution block (named CLC Block) for approximating regular convolution. By
resorting to the CDG as an analysis tool, we derive the rule for setting the
meta-parameters of IGC and GC and the framework for minimizing the
computational cost. A new CNN model named clcNet is then constructed using CLC
blocks, which shows significantly higher computational efficiency and fewer
parameters compared to state-of-the-art networks, when being tested using the
ImageNet-1K dataset. Source code is available at
https://github.com/dqzhang17/clcnet.torch
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
We introduce an extremely computation-efficient CNN architecture named
ShuffleNet, which is designed specially for mobile devices with very limited
computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new
operations, pointwise group convolution and channel shuffle, to greatly reduce
computation cost while maintaining accuracy. Experiments on ImageNet
classification and MS COCO object detection demonstrate the superior
performance of ShuffleNet over other structures, e.g. lower top-1 error
(absolute 7.8%) than recent MobileNet on ImageNet classification task, under
the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet
achieves ~13x actual speedup over AlexNet while maintaining comparable
accuracy
Transfer Learning with Binary Neural Networks
Previous work has shown that it is possible to train deep neural networks
with low precision weights and activations. In the extreme case it is even
possible to constrain the network to binary values. The costly floating point
multiplications are then reduced to fast logical operations. High end smart
phones such as Google's Pixel 2 and Apple's iPhone X are already equipped with
specialised hardware for image processing and it is very likely that other
future consumer hardware will also have dedicated accelerators for deep neural
networks. Binary neural networks are attractive in this case because the
logical operations are very fast and efficient when implemented in hardware. We
propose a transfer learning based architecture where we first train a binary
network on Imagenet and then retrain part of the network for different tasks
while keeping most of the network fixed. The fixed binary part could be
implemented in a hardware accelerator while the last layers of the network are
evaluated in software. We show that a single binary neural network trained on
the Imagenet dataset can indeed be used as a feature extractor for other
datasets.Comment: Machine Learning on the Phone and other Consumer Devices, NIPS2017
Worksho
- …