2,314 research outputs found
Heterogeneous processor pipeline for a product cipher application
Processing data received as a stream is a task commonly performed by modern
embedded devices, in a wide range of applications such as multimedia
(encoding/decoding/ playing media), networking (switching and routing), digital
security, scientific data processing, etc. Such processing normally tends to be
calculation intensive and therefore requiring significant processing power.
Therefore, hardware acceleration methods to increase the performance of such
applications constitute an important area of study. In this paper, we present
an evaluation of one such method to process streaming data, namely
multi-processor pipeline architecture. The hardware is based on a
Multiple-Processor System on Chip (MPSoC), using a data encryption algorithm as
a case study. The algorithm is partitioned on a coarse grained level and mapped
on to an MPSoC with five processor cores in a pipeline, using specifically
configured Xtensa LX3 cores. The system is then selectively optimized by
strengthening and pruning the resources of each processor core. The optimized
system is evaluated and compared against an optimal single-processor System on
Chip (SoC) for the same application. The multiple-processor pipeline system for
data encryption algorithms used was observed to provide significant speed ups,
up to 4.45 times that of the single-processor system, which is close to the
ideal speed up from a five-stage pipeline
Network Virtual Machine (NetVM): A New Architecture for Efficient and Portable Packet Processing Applications
A challenge facing network device designers, besides increasing the speed of network gear, is improving its programmability in order to simplify the implementation of new applications (see for example, active networks, content networking, etc). This paper presents our work on designing and implementing a virtual network processor, called NetVM, which has an instruction set optimized for packet processing applications, i.e., for handling network traffic. Similarly to a Java Virtual Machine that virtualizes a CPU, a NetVM virtualizes a network processor. The NetVM is expected to provide a compatibility layer for networking tasks (e.g., packet filtering, packet counting, string matching) performed by various packet processing applications (firewalls, network monitors, intrusion detectors) so that they can be executed on any network device, ranging from expensive routers to small appliances (e.g. smart phones). Moreover, the NetVM will provide efficient mapping of the elementary functionalities used to realize the above mentioned networking tasks upon specific hardware functional units (e.g., ASICs, FPGAs, and network processing elements) included in special purpose hardware systems possibly deployed to implement network devices
The "MIND" Scalable PIM Architecture
MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a
Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on
each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND
architecture
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
- …