3,625 research outputs found
Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device
Currently, most designers face a daunting task to
research different design flows and learn the intricacies of
specific software from various manufacturers in
hardware/software co-design. An urgent need of creating a
scalable hardware/software co-design platform has become a key
strategic element for developing hardware/software integrated
systems. In this paper, we propose a new design flow for building
a scalable co-design platform on FPGA-based system-on-chip.
We employ an integrated approach to implement a histogram
oriented gradients (HOG) and a support vector machine (SVM)
classification on a programmable device for pedestrian tracking.
Not only was hardware resource analysis reported, but the
precision and success rates of pedestrian tracking on nine open
access image data sets are also analysed. Finally, our proposed
design flow can be used for any real-time image processingrelated
products on programmable ZYNQ-based embedded
systems, which benefits from a reduced design time and provide a
scalable solution for embedded image processing products
FPGA based remote code integrity verification of programs in distributed embedded systems
The explosive growth of networked embedded systems has made ubiquitous and pervasive computing a reality. However, there are still a number of new challenges to its widespread adoption that include scalability, availability, and, especially, security of software. Among the different challenges in software security, the problem of remote-code integrity verification is still waiting for efficient solutions. This paper proposes the use of reconfigurable computing to build a consistent architecture for generation of attestations (proofs) of code integrity for an executing program as well as to deliver them to the designated verification entity. Remote dynamic update of reconfigurable devices is also exploited to increase the complexity of mounting attacks in a real-word environment. The proposed solution perfectly fits embedded devices that are nowadays commonly equipped with reconfigurable hardware components that are exploited to solve different computational problems
Multi-task Implementation for Image Reconstruction of an AER Communication
Address-Event-Representation (AER) is a communication protocol
for transferring spikes between bio-inspired chips. Such systems may consist of
a hierarchical structure with several chips that transmit spikes among them in
real time, while performing some processing. There exist several AER tools to
help in developing and testing AER based systems. These tools require the use
of a computer to allow the processing of the event information, reaching very
high bandwidth at the AER communication level. We propose to use an
embedded platform based on multi-task operating system to allow both, the
AER communication and the AER processing without a laptop or a computer.
We have connected and programmed a Gumstix computer to process Address-
Event information and measured the performance referred to the previous AER
tools solutions. In this paper, we present and study the performance of a new
philosophy of a frame-grabber AER tool based on a multi-task environment,
composed by the Intel XScale processor governed by an embedded GNU/Linux
system.Ministerio de Ciencia e Innovación TEC2006-11730-C03-0
Design of Digital Advanced Systems Based on Programmable System on Chip
This chapter fills up an advanced analysis of the state-of-the-art design in programmable SoC systems, giving a critical overall vision for every designer to implement real time operating systems and concurrent processing. The content of the chapter is divided in the next four main sections.
First the evolution timeline of FPGA based systems is covered from its beginning until the last AP SoC chips. They are complex devices and it is necessary to have a well-known understanding to utilise them in the more efficient form possible.
The more important advance digital systems structures and architectures are described. The embedded AP SoCs are analysed and main design methodologies are covered, focusing in hardware and co-design strategies.
In this section is described the development of a real open source application that covers the fundamental parts in the design of a SoC system, ranging from the hardware development until the software design involving the embedded operating system and the user interface application.
Finally, the system described in the last section is tested in a real scientific experiment and the results are evaluated
An FPGA Implementation of HW/SW Codesign Architecture for H.263 Video Coding
Chapitre 12 http://www.intechopen.com/download/pdf/pdfs_id/1574
Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator
Many FPGAs vendors have recently included embedded
processors in their devices, like Xilinx with ARM-Cortex
A cores, together with programmable logic cells. These devices
are known as Programmable System on Chip (PSoC). Their ARM
cores (embedded in the processing system or PS) communicates
with the programmable logic cells (PL) using ARM-standard AXI
buses. In this paper we analyses the performance of exhaustive
data transfers between PS and PL for a Xilinx Zynq FPGA
in a co-design real scenario for Convolutional Neural Networks
(CNN) accelerator, which processes, in dedicated hardware, a
stream of visual information from a neuromorphic visual sensor
for classification. In the PS side, a Linux operating system is
running, which recollects visual events from the neuromorphic
sensor into a normalized frame, and then it transfers these
frames to the accelerator of multi-layered CNNs, and read results,
using an AXI-DMA bus in a per-layer way. As these kind of
accelerators try to process information as quick as possible, data
bandwidth becomes critical and maintaining a good balanced
data throughput rate requires some considerations. We present
and evaluate several data partitioning techniques to improve the
balance between RX and TX transfer and two different ways
of transfers management: through a polling routine at the userlevel
of the OS, and through a dedicated interrupt-based kernellevel
driver. We demonstrate that for longer enough packets,
the kernel-level driver solution gets better timing in computing a
CNN classification example. Main advantage of using kernel-level
driver is to have safer solutions and to have tasks scheduling in
the OS to manage other important processes for our application,
like frames collection from sensors and their normalization.Ministerio de Economía y Competitividad TEC2016-77785-
From FPGA to ASIC: A RISC-V processor experience
This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC
- …