26,781 research outputs found
Practical Block-wise Neural Network Architecture Generation
Convolutional neural networks have gained a remarkable success in computer
vision. However, most usable network architectures are hand-crafted and usually
require expertise and elaborate design. In this paper, we provide a block-wise
network generation pipeline called BlockQNN which automatically builds
high-performance networks using the Q-Learning paradigm with epsilon-greedy
exploration strategy. The optimal network block is constructed by the learning
agent which is trained sequentially to choose component layers. We stack the
block to construct the whole auto-generated network. To accelerate the
generation process, we also propose a distributed asynchronous framework and an
early stop strategy. The block-wise generation brings unique advantages: (1) it
performs competitive results in comparison to the hand-crafted state-of-the-art
networks on image classification, additionally, the best network generated by
BlockQNN achieves 3.54% top-1 error rate on CIFAR-10 which beats all existing
auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of
the search space in designing networks which only spends 3 days with 32 GPUs,
and (3) moreover, it has strong generalizability that the network built on
CIFAR also performs well on a larger-scale ImageNet dataset.Comment: Accepted to CVPR 201
Automatically Evolving CNN Architectures Based on Blocks
The performance of Convolutional Neural Networks (CNNs) highly relies on
their architectures. In order to design a CNN with promising performance,
extended expertise in both CNNs and the investigated problem is required, which
is not necessarily held by every user interested in CNNs or the problem domain.
In this paper, we propose to automatically evolve CNN architectures by using a
genetic algorithm based on ResNet blocks and DenseNet blocks. The proposed
algorithm is \textbf{completely} automatic in designing CNN architectures,
particularly, neither pre-processing before it starts nor post-processing on
the designed CNN is needed. Furthermore, the proposed algorithm does not
require users with domain knowledge on CNNs, the investigated problem or even
genetic algorithms. The proposed algorithm is evaluated on CIFAR10 and CIFAR100
against 18 state-of-the-art peer competitors. Experimental results show that it
outperforms state-of-the-art CNNs hand-crafted and CNNs designed by automatic
peer competitors in terms of the classification accuracy, and achieves the
competitive classification accuracy against semi-automatic peer competitors. In
addition, the proposed algorithm consumes much less time than most peer
competitors in finding the best CNN architectures
Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA
Deep convolutional neural networks (CNN) based solutions are the current
state- of-the-art for computer vision tasks. Due to the large size of these
models, they are typically run on clusters of CPUs or GPUs. However, power
requirements and cost budgets can be a major hindrance in adoption of CNN for
IoT applications. Recent research highlights that CNN contain significant
redundancy in their structure and can be quantized to lower bit-width
parameters and activations, while maintaining acceptable accuracy. Low
bit-width and especially single bit-width (binary) CNN are particularly
suitable for mobile applications based on FPGA implementation, due to the
bitwise logic operations involved in binarized CNN. Moreover, the transition to
lower bit-widths opens new avenues for performance optimizations and model
improvement. In this paper, we present an automatic flow from trained
TensorFlow models to FPGA system on chip implementation of binarized CNN. This
flow involves quantization of model parameters and activations, generation of
network and model in embedded-C, followed by automatic generation of the FPGA
accelerator for binary convolutions. The automated flow is demonstrated through
implementation of binarized "YOLOV2" on the low cost, low power Cyclone- V FPGA
device. Experiments on object detection using binarized YOLOV2 demonstrate
significant performance benefit in terms of model size and inference speed on
FPGA as compared to CPU and mobile CPU platforms. Furthermore, the entire
automated flow from trained models to FPGA synthesis can be completed within
one hour.Comment: 7 pages, 9 figures. Accepted and presented at MLPCD workshop, NIPS
2017 (LongBeach, California
Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation
Deep convolutional neural networks have achieved remarkable progress on a
variety of medical image computing tasks. A common problem when applying
supervised deep learning methods to medical images is the lack of labeled data,
which is very expensive and time-consuming to be collected. In this paper, we
present a novel semi-supervised method for medical image segmentation, where
the network is optimized by the weighted combination of a common supervised
loss for labeled inputs only and a regularization loss for both labeled and
unlabeled data. To utilize the unlabeled data, our method encourages the
consistent predictions of the network-in-training for the same input under
different regularizations. Aiming for the semi-supervised segmentation problem,
we enhance the effect of regularization for pixel-level predictions by
introducing a transformation, including rotation and flipping, consistent
scheme in our self-ensembling model. With the aim of semi-supervised
segmentation tasks, we introduce a transformation consistent strategy in our
self-ensembling model to enhance the regularization effect for pixel-level
predictions. We have extensively validated the proposed semi-supervised method
on three typical yet challenging medical image segmentation tasks: (i) skin
lesion segmentation from dermoscopy images on International Skin Imaging
Collaboration (ISIC) 2017 dataset, (ii) optic disc segmentation from fundus
images on Retinal Fundus Glaucoma Challenge (REFUGE) dataset, and (iii) liver
segmentation from volumetric CT scans on Liver Tumor Segmentation Challenge
(LiTS) dataset. Compared to the state-of-the-arts, our proposed method shows
superior segmentation performance on challenging 2D/3D medical images,
demonstrating the effectiveness of our semi-supervised method for medical image
segmentation.Comment: Accept at IEEE Transactions on Neural Networks and Learning System
Music Genre Classification with Paralleling Recurrent Convolutional Neural Network
Deep learning has been demonstrated its effectiveness and efficiency in music
genre classification. However, the existing achievements still have several
shortcomings which impair the performance of this classification task. In this
paper, we propose a hybrid architecture which consists of the paralleling CNN
and Bi-RNN blocks. They focus on spatial features and temporal frame orders
extraction respectively. Then the two outputs are fused into one powerful
representation of musical signals and fed into softmax function for
classification. The paralleling network guarantees the extracting features
robust enough to represent music. Moreover, the experiments prove our proposed
architecture improve the music genre classification performance and the
additional Bi-RNN block is a supplement for CNNs
Densely Connected Convolutional Networks for Speech Recognition
This paper presents our latest investigation on Densely Connected
Convolutional Networks (DenseNets) for acoustic modelling (AM) in automatic
speech recognition. DenseN-ets are very deep, compact convolutional neural
networks, which have demonstrated incredible improvements over the
state-of-the-art results on several data sets in computer vision. Our
experimental results show that DenseNet can be used for AM significantly
outperforming other neural-based models such as DNNs, CNNs, VGGs. Furthermore,
results on Wall Street Journal revealed that with only a half of the training
data DenseNet was able to outperform other models trained with the full data
set by a large margin.Comment: 5 pages, 3 figures, the 13th ITG conference on Speech Communicatio
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Semi-Supervised Segmentation of Salt Bodies in Seismic Images using an Ensemble of Convolutional Neural Networks
Seismic image analysis plays a crucial role in a wide range of industrial
applications and has been receiving significant attention. One of the essential
challenges of seismic imaging is detecting subsurface salt structure which is
indispensable for identification of hydrocarbon reservoirs and drill path
planning. Unfortunately, exact identification of large salt deposits is
notoriously difficult and professional seismic imaging often requires expert
human interpretation of salt bodies. Convolutional neural networks (CNNs) have
been successfully applied in many fields, and several attempts have been made
in the field of seismic imaging. But the high cost of manual annotations by
geophysics experts and scarce publicly available labeled datasets hinder the
performance of the existing CNN-based methods. In this work, we propose a
semi-supervised method for segmentation (delineation) of salt bodies in seismic
images which utilizes unlabeled data for multi-round self-training. To reduce
error amplification during self-training we propose a scheme which uses an
ensemble of CNNs. We show that our approach outperforms state-of-the-art on the
TGS Salt Identification Challenge dataset and is ranked the first among the
3234 competing methods.Comment: Accepted at GCPR 2019, Source code:
https://github.com/ybabakhin/kaggle_salt_bes_phalan
FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations
Neural network-based methods for image processing are becoming widely used in
practical applications. Modern neural networks are computationally expensive
and require specialized hardware, such as graphics processing units. Since such
hardware is not always available in real life applications, there is a
compelling need for the design of neural networks for mobile devices. Mobile
neural networks typically have reduced number of parameters and require a
relatively small number of arithmetic operations. However, they usually still
are executed at the software level and use floating-point calculations. The use
of mobile networks without further optimization may not provide sufficient
performance when high processing speed is required, for example, in real-time
video processing (30 frames per second). In this study, we suggest
optimizations to speed up computations in order to efficiently use already
trained neural networks on a mobile device. Specifically, we propose an
approach for speeding up neural networks by moving computation from software to
hardware and by using fixed-point calculations instead of floating-point. We
propose a number of methods for neural network architecture design to improve
the performance with fixed-point calculations. We also show an example of how
existing datasets can be modified and adapted for the recognition task in hand.
Finally, we present the design and the implementation of a floating-point gate
array-based device to solve the practical problem of real-time handwritten
digit classification from mobile camera video feed
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Convolutional neural networks (CNN) have shown promising results for
end-to-end speech recognition, albeit still behind other state-of-the-art
methods in performance. In this paper, we study how to bridge this gap and go
beyond with a novel CNN-RNN-transducer architecture, which we call ContextNet.
ContextNet features a fully convolutional encoder that incorporates global
context information into convolution layers by adding squeeze-and-excitation
modules. In addition, we propose a simple scaling method that scales the widths
of ContextNet that achieves good trade-off between computation and accuracy. We
demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves
a word error rate (WER) of 2.1%/4.6% without external language model (LM),
1.9%/4.1% with LM and 2.9%/7.0% with only 10M parameters on the clean/noisy
LibriSpeech test sets. This compares to the previous best published system of
2.0%/4.6% with LM and 3.9%/11.3% with 20M parameters. The superiority of the
proposed ContextNet model is also verified on a much larger internal dataset.Comment: Submitted to Interspeech 202
- …