1,733 research outputs found
DyPS: Dynamic Processor Switching for Energy-Aware Video Decoding on Multi-core SoCs
In addition to General Purpose Processors (GPP), Multicore SoCs equipping
modern mobile devices contain specialized Digital Signal Processor designed
with the aim to provide better performance and low energy consumption
properties. However, the experimental measurements we have achieved revealed
that system overhead, in case of DSP video decoding, causes drastic
performances drop and energy efficiency as compared to the GPP decoding. This
paper describes DyPS, a new approach for energy-aware processor switching (GPP
or DSP) according to the video quality . We show the pertinence of our solution
in the context of adaptive video decoding and describe an implementation on an
embedded Linux operating system with the help of the GStreamer framework. A
simple case study showed that DyPS achieves 30% energy saving while sustaining
the decoding performanc
Homogeneous and heterogeneous MPSoC architectures with network-on-chip connectivity for low-power and real-time multimedia signal processing
Two multiprocessor system-on-chip (MPSoC) architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC) infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices
Distributed modular RT-systems for detector DAQ, trigger and control applications
A modular approach to development of distributed modular system architecture for detector control, data acquisition and trigger data processing is proposed. A multilevel parallel-pipeline model of data acquisition, processing and control is proposed and discussed. Multiprocessor architecture with SCI-based interconnections is proposed as good high-performance system for parallel-pipeline data processing. A network (Ethernet -100) can be used for loading, monitoring and diagnostic purposes independent of basic interconnections. The modular cPCI-based structures with high speed modular interconnections are proposed for DAQ and control applications. For distributed control RT-systems, to construct the effective (cost-performance) systems the same platform of an Intel compatible processor board should be used. The basic computer multiprocessor nodes consist of high-power PC MB (Industrial Computer Systems), which are interconnected by SCI modules and link to embedded microprocessor-based sub-systems for control applications. The required number of multiprocessor nodes should be interconnected by SCI for parallel-pipeline data processing in real time (according to the multilevel model) and link to RT-systems for embedded control. (19 refs)
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design
This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation.
The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed.
In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling.
The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz
Energy Saving Techniques for Phase Change Memory (PCM)
In recent years, the energy consumption of computing systems has increased
and a large fraction of this energy is consumed in main memory. Towards this,
researchers have proposed use of non-volatile memory, such as phase change
memory (PCM), which has low read latency and power; and nearly zero leakage
power. However, the write latency and power of PCM are very high and this,
along with limited write endurance of PCM present significant challenges in
enabling wide-spread adoption of PCM. To address this, several
architecture-level techniques have been proposed. In this report, we review
several techniques to manage power consumption of PCM. We also classify these
techniques based on their characteristics to provide insights into them. The
aim of this work is encourage researchers to propose even better techniques for
improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM
Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device
Currently, most designers face a daunting task to
research different design flows and learn the intricacies of
specific software from various manufacturers in
hardware/software co-design. An urgent need of creating a
scalable hardware/software co-design platform has become a key
strategic element for developing hardware/software integrated
systems. In this paper, we propose a new design flow for building
a scalable co-design platform on FPGA-based system-on-chip.
We employ an integrated approach to implement a histogram
oriented gradients (HOG) and a support vector machine (SVM)
classification on a programmable device for pedestrian tracking.
Not only was hardware resource analysis reported, but the
precision and success rates of pedestrian tracking on nine open
access image data sets are also analysed. Finally, our proposed
design flow can be used for any real-time image processingrelated
products on programmable ZYNQ-based embedded
systems, which benefits from a reduced design time and provide a
scalable solution for embedded image processing products
- …