21,146 research outputs found
Octopus: A Heterogeneous In-network Computing Accelerator Enabling Deep Learning for network
Deep learning (DL) for network models have achieved excellent performance in
the field and are becoming a promising component in future intelligent network
system. Programmable in-network computing device has great potential to deploy
DL for network models, however, existing device cannot afford to run a DL
model. The main challenges of data-plane supporting DL-based network models lie
in computing power, task granularity, model generality and feature extracting.
To address above problems, we propose Octopus: a heterogeneous in-network
computing accelerator enabling DL for network models. A feature extractor is
designed for fast and efficient feature extracting. Vector accelerator and
systolic array work in a heterogeneous collaborative way, offering
low-latency-highthroughput general computing ability for packet-and-flow-based
tasks. Octopus also contains on-chip memory fabric for storage and connecting,
and Risc-V core for global controlling. The proposed Octopus accelerator design
is implemented on FPGA. Functionality and performance of Octopus are validated
in several use-cases, achieving performance of 31Mpkt/s feature extracting,
207ns packet-based computing latency, and 90kflow/s flow-based computing
throughput
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
Neuromorphic computing systems comprise networks of neurons that use
asynchronous events for both computation and communication. This type of
representation offers several advantages in terms of bandwidth and power
consumption in neuromorphic electronic systems. However, managing the traffic
of asynchronous events in large scale systems is a daunting task, both in terms
of circuit complexity and memory requirements. Here we present a novel routing
methodology that employs both hierarchical and mesh routing strategies and
combines heterogeneous memory structures for minimizing both memory
requirements and latency, while maximizing programming flexibility to support a
wide range of event-based neural network architectures, through parameter
configuration. We validated the proposed scheme in a prototype multi-core
neuromorphic processor chip that employs hybrid analog/digital circuits for
emulating synapse and neuron dynamics together with asynchronous digital
circuits for managing the address-event traffic. We present a theoretical
analysis of the proposed connectivity scheme, describe the methods and circuits
used to implement such scheme, and characterize the prototype chip. Finally, we
demonstrate the use of the neuromorphic processor with a convolutional neural
network for the real-time classification of visual symbols being flashed to a
dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
- …