5,842 research outputs found
Xception: Deep Learning with Depthwise Separable Convolutions
We present an interpretation of Inception modules in convolutional neural
networks as being an intermediate step in-between regular convolution and the
depthwise separable convolution operation (a depthwise convolution followed by
a pointwise convolution). In this light, a depthwise separable convolution can
be understood as an Inception module with a maximally large number of towers.
This observation leads us to propose a novel deep convolutional neural network
architecture inspired by Inception, where Inception modules have been replaced
with depthwise separable convolutions. We show that this architecture, dubbed
Xception, slightly outperforms Inception V3 on the ImageNet dataset (which
Inception V3 was designed for), and significantly outperforms Inception V3 on a
larger image classification dataset comprising 350 million images and 17,000
classes. Since the Xception architecture has the same number of parameters as
Inception V3, the performance gains are not due to increased capacity but
rather to a more efficient use of model parameters
3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks
Standard 3D convolution operations require much larger amounts of memory and
computation cost than 2D convolution operations. The fact has hindered the
development of deep neural nets in many 3D vision tasks. In this paper, we
investigate the possibility of applying depthwise separable convolutions in 3D
scenario and introduce the use of 3D depthwise convolution. A 3D depthwise
convolution splits a single standard 3D convolution into two separate steps,
which would drastically reduce the number of parameters in 3D convolutions with
more than one order of magnitude. We experiment with 3D depthwise convolution
on popular CNN architectures and also compare it with a similar structure
called pseudo-3D convolution. The results demonstrate that, with 3D depthwise
convolutions, 3D vision tasks like classification and reconstruction can be
carried out with more light-weighted neural networks while still delivering
comparable performances.Comment: Work in progres
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
Depth sensing is a critical function for robotic tasks such as localization,
mapping and obstacle detection. There has been a significant and growing
interest in depth estimation from a single RGB image, due to the relatively low
cost and size of monocular cameras. However, state-of-the-art single-view depth
estimation algorithms are based on fairly complex deep neural networks that are
too slow for real-time inference on an embedded platform, for instance, mounted
on a micro aerial vehicle. In this paper, we address the problem of fast depth
estimation on embedded systems. We propose an efficient and lightweight
encoder-decoder network architecture and apply network pruning to further
reduce computational complexity and latency. In particular, we focus on the
design of a low-latency decoder. Our methodology demonstrates that it is
possible to achieve similar accuracy as prior work on depth estimation, but at
inference speeds that are an order of magnitude faster. Our proposed network,
FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using
only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves
close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of
the authors' knowledge, this paper demonstrates real-time monocular depth
estimation using a deep neural network with the lowest latency and highest
throughput on an embedded platform that can be carried by a micro aerial
vehicle.Comment: Accepted for presentation at ICRA 2019. 8 pages, 6 figures, 7 table
- …
