16,752 research outputs found
Recent Advances in Convolutional Neural Network Acceleration
In recent years, convolutional neural networks (CNNs) have shown great
performance in various fields such as image classification, pattern
recognition, and multi-media compression. Two of the feature properties, local
connectivity and weight sharing, can reduce the number of parameters and
increase processing speed during training and inference. However, as the
dimension of data becomes higher and the CNN architecture becomes more
complicated, the end-to-end approach or the combined manner of CNN is
computationally intensive, which becomes limitation to CNN's further
implementation. Therefore, it is necessary and urgent to implement CNN in a
faster way. In this paper, we first summarize the acceleration methods that
contribute to but not limited to CNN by reviewing a broad variety of research
papers. We propose a taxonomy in terms of three levels, i.e.~structure level,
algorithm level, and implementation level, for acceleration methods. We also
analyze the acceleration methods in terms of CNN architecture compression,
algorithm optimization, and hardware-based improvement. At last, we give a
discussion on different perspectives of these acceleration and optimization
methods within each level. The discussion shows that the methods in each level
still have large exploration space. By incorporating such a wide range of
disciplines, we expect to provide a comprehensive reference for researchers who
are interested in CNN acceleration.Comment: submitted to Neurocomputin
O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis
We present O-CNN, an Octree-based Convolutional Neural Network (CNN) for 3D
shape analysis. Built upon the octree representation of 3D shapes, our method
takes the average normal vectors of a 3D model sampled in the finest leaf
octants as input and performs 3D CNN operations on the octants occupied by the
3D shape surface. We design a novel octree data structure to efficiently store
the octant information and CNN features into the graphics memory and execute
the entire O-CNN training and evaluation on the GPU. O-CNN supports various CNN
structures and works for 3D shapes in different representations. By restraining
the computations on the octants occupied by 3D surfaces, the memory and
computational costs of the O-CNN grow quadratically as the depth of the octree
increases, which makes the 3D CNN feasible for high-resolution 3D models. We
compare the performance of the O-CNN with other existing 3D CNN solutions and
demonstrate the efficiency and efficacy of O-CNN in three shape analysis tasks,
including object classification, shape retrieval, and shape segmentation
PZnet: Efficient 3D ConvNet Inference on Manycore CPUs
Convolutional nets have been shown to achieve state-of-the-art accuracy in
many biomedical image analysis tasks. Many tasks within biomedical analysis
domain involve analyzing volumetric (3D) data acquired by CT, MRI and
Microscopy acquisition methods. To deploy convolutional nets in practical
working systems, it is important to solve the efficient inference problem.
Namely, one should be able to apply an already-trained convolutional network to
many large images using limited computational resources. In this paper we
present PZnet, a CPU-only engine that can be used to perform inference for a
variety of 3D convolutional net architectures. PZNet outperforms MKL-based CPU
implementations of PyTorch and Tensorflow by more than 3.5x for the popular
U-net architecture. Moreover, for 3D convolutions with low featuremap numbers,
cloud CPU inference with PZnet outperfroms cloud GPU inference in terms of cost
efficiency
Deep and Wide Multiscale Recursive Networks for Robust Image Labeling
Feedforward multilayer networks trained by supervised learning have recently
demonstrated state of the art performance on image labeling problems such as
boundary prediction and scene parsing. As even very low error rates can limit
practical usage of such systems, methods that perform closer to human accuracy
remain desirable. In this work, we propose a new type of network with the
following properties that address what we hypothesize to be limiting aspects of
existing methods: (1) a `wide' structure with thousands of features, (2) a
large field of view, (3) recursive iterations that exploit statistical
dependencies in label space, and (4) a parallelizable architecture that can be
trained in a fraction of the time compared to benchmark multilayer
convolutional networks. For the specific image labeling problem of boundary
prediction, we also introduce a novel example weighting algorithm that improves
segmentation accuracy. Experiments in the challenging domain of connectomic
reconstruction of neural circuity from 3d electron microscopy data show that
these "Deep And Wide Multiscale Recursive" (DAWMR) networks lead to new levels
of image labeling performance. The highest performing architecture has twelve
layers, interwoven supervised and unsupervised stages, and uses an input field
of view of 157,464 voxels () to make a prediction at each image location.
We present an associated open source software package that enables the simple
and flexible creation of DAWMR networks
Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges
With the emerging big data applications of Machine Learning, Speech
Recognition, Artificial Intelligence, and DNA Sequencing in recent years,
computer architecture research communities are facing the explosive scale of
various data explosion. To achieve high efficiency of data-intensive computing,
studies of heterogeneous accelerators which focus on latest applications, have
become a hot issue in computer architecture domain. At present, the
implementation of heterogeneous accelerators mainly relies on heterogeneous
computing units such as Application-specific Integrated Circuit (ASIC),
Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among
the typical heterogeneous architectures above, FPGA-based reconfigurable
accelerators have two merits as follows: First, FPGA architecture contains a
large number of reconfigurable circuits, which satisfy requirements of high
performance and low power consumption when specific applications are running.
Second, the reconfigurable architectures of employing FPGA performs prototype
systems rapidly and features excellent customizability and reconfigurability.
Nowadays, in top-tier conferences of computer architecture, emerging a batch of
accelerating works based on FPGA or other reconfigurable architectures. To
better review the related work of reconfigurable computing accelerators
recently, this survey reserves latest high-level research products of
reconfigurable accelerator architectures and algorithm applications as the
basis. In this survey, we compare hot research issues and concern domains,
furthermore, analyze and illuminate advantages, disadvantages, and challenges
of reconfigurable accelerators. In the end, we prospect the development
tendency of accelerator architectures in the future, hoping to provide a
reference for computer architecture researchers
Dual Path Networks
In this work, we present a simple, highly efficient and modularized Dual Path
Network (DPN) for image classification which presents a new topology of
connection paths internally. By revealing the equivalence of the
state-of-the-art Residual Network (ResNet) and Densely Convolutional Network
(DenseNet) within the HORNN framework, we find that ResNet enables feature
re-usage while DenseNet enables new features exploration which are both
important for learning good representations. To enjoy the benefits from both
path topologies, our proposed Dual Path Network shares common features while
maintaining the flexibility to explore new features through dual path
architectures. Extensive experiments on three benchmark datasets, ImagNet-1k,
Places365 and PASCAL VOC, clearly demonstrate superior performance of the
proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset,
a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model
size, 25% less computational cost and 8% lower memory consumption, and a deeper
DPN (DPN-131) further pushes the state-of-the-art single model performance with
about 2 times faster training speed. Experiments on the Places365 large-scale
scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation
dataset also demonstrate its consistently better performance than DenseNet,
ResNet and the latest ResNeXt model over various applications.Comment: for code and models, see https://github.com/cypw/DPN
Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method
Deep neural network models used for medical image segmentation are large
because they are trained with high-resolution three-dimensional (3D) images.
Graphics processing units (GPUs) are widely used to accelerate the trainings.
However, the memory on a GPU is not large enough to train the models. A popular
approach to tackling this problem is patch-based method, which divides a large
image into small patches and trains the models with these small patches.
However, this method would degrade the segmentation quality if a target object
spans multiple patches. In this paper, we propose a novel approach for 3D
medical image segmentation that utilizes the data-swapping, which swaps out
intermediate data from GPU memory to CPU memory to enlarge the effective GPU
memory size, for training high-resolution 3D medical images without patching.
We carefully tuned parameters in the data-swapping method to obtain the best
training performance for 3D U-Net, a widely used deep neural network model for
medical image segmentation. We applied our tuning to train 3D U-Net with
full-size images of 192 x 192 x 192 voxels in brain tumor dataset. As a result,
communication overhead, which is the most important issue, was reduced by
17.1%. Compared with the patch-based method for patches of 128 x 128 x 128
voxels, our training for full-size images achieved improvement on the mean Dice
score by 4.48% and 5.32 % for detecting whole tumor sub-region and tumor core
sub-region, respectively. The total training time was reduced from 164 hours to
47 hours, resulting in 3.53 times of acceleration.Comment: 13 page
Accelerating Very Deep Convolutional Networks for Classification and Detection
This paper aims to accelerate the test-time computation of convolutional
neural networks (CNNs), especially very deep CNNs that have substantially
impacted the computer vision community. Unlike previous methods that are
designed for approximating linear filters or linear responses, our method takes
the nonlinear units into account. We develop an effective solution to the
resulting nonlinear optimization problem without the need of stochastic
gradient descent (SGD). More importantly, while previous methods mainly focus
on optimizing one or two layers, our nonlinear method enables an asymmetric
reconstruction that reduces the rapidly accumulated error when multiple (e.g.,
>=10) layers are approximated. For the widely used very deep VGG-16 model, our
method achieves a whole-model speedup of 4x with merely a 0.3% increase of
top-5 error in ImageNet classification. Our 4x accelerated VGG-16 model also
shows a graceful accuracy degradation for object detection when plugged into
the Fast R-CNN detector.Comment: TPAMI, accepted. arXiv admin note: substantial text overlap with
arXiv:1411.422
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
In many robotics and VR/AR applications, 3D-videos are readily-available
sources of input (a continuous sequence of depth images, or LIDAR scans).
However, those 3D-videos are processed frame-by-frame either through 2D
convnets or 3D perception algorithms. In this work, we propose 4-dimensional
convolutional neural networks for spatio-temporal perception that can directly
process such 3D-videos using high-dimensional convolutions. For this, we adopt
sparse tensors and propose the generalized sparse convolution that encompasses
all discrete convolutions. To implement the generalized sparse convolution, we
create an open-source auto-differentiation library for sparse tensors that
provides extensive functions for high-dimensional convolutional neural
networks. We create 4D spatio-temporal convolutional neural networks using the
library and validate them on various 3D semantic segmentation benchmarks and
proposed 4D datasets for 3D-video perception. To overcome challenges in the 4D
space, we propose the hybrid kernel, a special case of the generalized sparse
convolution, and the trilateral-stationary conditional random field that
enforces spatio-temporal consistency in the 7D space-time-chroma space.
Experimentally, we show that convolutional neural networks with only
generalized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods by
a large margin. Also, we show that on 3D-videos, 4D spatio-temporal
convolutional neural networks are robust to noise, outperform 3D convolutional
neural networks and are faster than the 3D counterpart in some cases.Comment: CVPR'1
A Survey on Deep Learning Methods for Robot Vision
Deep learning has allowed a paradigm shift in pattern recognition, from using
hand-crafted features together with statistical classifiers to using
general-purpose learning procedures for learning data-driven representations,
features, and classifiers together. The application of this new paradigm has
been particularly successful in computer vision, in which the development of
deep learning methods for vision applications has become a hot research topic.
Given that deep learning has already attracted the attention of the robot
vision community, the main purpose of this survey is to address the use of deep
learning in robot vision. To achieve this, a comprehensive overview of deep
learning and its usage in computer vision is given, that includes a description
of the most frequently used neural models and their main application areas.
Then, the standard methodology and tools used for designing deep-learning based
vision systems are presented. Afterwards, a review of the principal work using
deep learning in robot vision is presented, as well as current and future
trends related to the use of deep learning in robotics. This survey is intended
to be a guide for the developers of robot vision systems
- …