3,125 research outputs found
XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to
conventional deep neural networks at a fraction of the cost in terms of memory
and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully
digital configurable hardware accelerator IP for BNNs, integrated within a
microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid
SRAM / standard cell memory. The XNE is able to fully compute convolutional and
dense layers in autonomy or in cooperation with the core in the MCU to realize
more complex behaviors. We show post-synthesis results in 65nm and 22nm
technology for the XNE IP and post-layout results in 22nm for the full MCU
indicating that this system can drop the energy cost per binary operation to
21.6fJ per operation at 0.4V, and at the same time is flexible and performant
enough to execute state-of-the-art BNN topologies such as ResNet-34 in less
than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation
at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design
of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
SMASH: One-Shot Model Architecture Search through HyperNetworks
Designing architectures for deep neural networks requires expert knowledge
and substantial computation time. We propose a technique to accelerate
architecture selection by learning an auxiliary HyperNet that generates the
weights of a main model conditioned on that model's architecture. By comparing
the relative validation performance of networks with HyperNet-generated
weights, we can effectively search over a wide range of architectures at the
cost of a single training run. To facilitate this search, we develop a flexible
mechanism based on memory read-writes that allows us to define a wide range of
network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as
special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100,
STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with
similarly-sized hand-designed networks. Our code is available at
https://github.com/ajbrock/SMAS
U-Net and its variants for medical image segmentation: theory and applications
U-net is an image segmentation technique developed primarily for medical
image analysis that can precisely segment images using a scarce amount of
training data. These traits provide U-net with a very high utility within the
medical imaging community and have resulted in extensive adoption of U-net as
the primary tool for segmentation tasks in medical imaging. The success of
U-net is evident in its widespread use in all major image modalities from CT
scans and MRI to X-rays and microscopy. Furthermore, while U-net is largely a
segmentation tool, there have been instances of the use of U-net in other
applications. As the potential of U-net is still increasing, in this review we
look at the various developments that have been made in the U-net architecture
and provide observations on recent trends. We examine the various innovations
that have been made in deep learning and discuss how these tools facilitate
U-net. Furthermore, we look at image modalities and application areas where
U-net has been applied.Comment: 42 pages, in IEEE Acces
- …