1,977 research outputs found
Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs
As modern FPGAs evolve to include more het- erogeneous processing elements,
such as ARM cores, it makes sense to consider these devices as processors first
and FPGA accelerators second. As such, the conventional FPGA develop- ment
environment must also adapt to support more software- like programming
functionality. While high-level synthesis tools can help reduce FPGA
development time, there still remains a large expertise gap in order to realize
highly performing implementations. At a system-level the skill set necessary to
integrate multiple custom IP hardware cores, interconnects, memory interfaces,
and now heterogeneous processing elements is complex. Rather than drive FPGA
development from the hardware up, we consider the impact of leveraging Python
to ac- celerate application development. Python offers highly optimized
libraries from an incredibly large developer community, yet is limited to the
performance of the hardware system. In this work we evaluate the impact of
using PYNQ, a Python development environment for application development on the
Xilinx Zynq devices, the performance implications, and bottlenecks associated
with it. We compare our results against existing C-based and hand-coded
implementations to better understand if Python can be the glue that binds
together software and hardware developers.Comment: To appear in 2017 IEEE 25th Annual International Symposium on
Field-Programmable Custom Computing Machines (FCCM'17
Microprocessor and FPGA interfaces for in-system co-debugging in field programmable hybrid systems
Modern trends in technology require efficient control and processing platforms based on connected software-hardware subsystems. Due to their complexity and size, algorithms implemented on these platforms are difficult to test and verify. When these types of solution are being designed, it is necessary to provide information of the internal values of registers and memories of both the software and hardware during the execution of the complete system. The final architecture of the targeted design and its debugging capabilities strongly depends on how the hybrid system is connected and clocked. This article discusses different architectural strategies that have been adopted for a hybrid hardware-software platform, built ready for debugging, and that uses components that can be easily found with a few special features. All the solutions have been implemented and evaluated using the UNSHADES-2 framework
GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs
In recent years, architectures combining a reconfigurable fabric and a
general purpose processor on a single chip became increasingly popular. Such
hybrid architectures allow extending embedded software with application
specific hardware accelerators to improve performance and/or energy efficiency.
Aiding system designers and programmers at handling the complexity of the
required process of hardware/software (HW/SW) partitioning is an important
issue. Current methods are often restricted, either to bare-metal systems, to
subsets of mainstream programming languages, or require special coding
guidelines, e.g., via annotations. These restrictions still represent a high
entry barrier for the wider community of programmers that new hybrid
architectures are intended for. In this paper we revisit HW/SW partitioning and
present a seamless programming flow for unrestricted, legacy C code. It
consists of a retargetable GCC plugin that automatically identifies code
sections for hardware acceleration and generates code accordingly. The proposed
workflow was evaluated on the Xilinx Zynq platform using unmodified code from
an embedded benchmark suite.Comment: Presented at Second International Workshop on FPGAs for Software
Programmers (FSP 2015) (arXiv:1508.06320
Document Classification Systems in Heterogeneous Computing Environments
Datacenter workloads demand high throughput, low cost and power efficient solutions. In most data centers the operating costs dominates the infrastructure cost. The ever growing amounts of data and the critical need for higher throughput, more energy efficient document classification solutions motivated us to investigate alternatives to the traditional homogeneous CPU based implementations of document classification systems. Several heterogeneous systems were investigated in the past where CPUs were combined with GPUs and FPGAs as system accelerators. The increasing complexity of FPGAs made them an interesting device in the heterogeneous computing environments and on the other hand difficult to program using Hardware Description languages. We explore the trade-offs when using high level synthesis and low level synthesis when programming FPGAs. Using low level synthesis results in less hardware resource usage on FPGAs and also offers the higher throughput compared to using HLS tool. While using HLS tool different heterogeneous computing devices such as multicore CPU and GPU targeted. Through our implementation experience and empirical results for data centric applications, we conclude that we can achieve power efficient results for these set of applications by either using low level synthesis or high level synthesis for programming FPGAs
The development of a node for a hardware reconfigurable parallel processor
This dissertation concerns the design and implementation of a node for a hardware reconfigurable parallel processor. The hardware that was developed allows for the further development of a parallel processor with configurable hardware acceleration. Each node in the system has a standard microprocessor and reconfigurable logic device and has high speed communications channels for inter-node communication. The design of the node provided high-speed serial communications channels allowing the implementation of various network topographies. The node also provided a PCI master interface to provide an external interface and communicate with local nodes on the bus. A high speed RlSC processor provided communication and system control functions and the reconfigurable logic device provided communication interfaces and data processing functions. The node was designed and implemented as a PCI card that interfaced a standard PCI bus. VHDL designs for logic devices that provided system support were developed, VHDL designs for the reconfigurable logic FPGA and software including drivers and system software were written for the node. The 64-bit version Linux operating system was then ported to the processor providing a UNIX environment for the system. The node functioned as specified and parallel and hardware accelerated processing was demonstrated. The hardware acceleration was shown to provide substantial performance benefits for the system
- …