1,977 research outputs found

    Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs

    Full text link
    As modern FPGAs evolve to include more het- erogeneous processing elements, such as ARM cores, it makes sense to consider these devices as processors first and FPGA accelerators second. As such, the conventional FPGA develop- ment environment must also adapt to support more software- like programming functionality. While high-level synthesis tools can help reduce FPGA development time, there still remains a large expertise gap in order to realize highly performing implementations. At a system-level the skill set necessary to integrate multiple custom IP hardware cores, interconnects, memory interfaces, and now heterogeneous processing elements is complex. Rather than drive FPGA development from the hardware up, we consider the impact of leveraging Python to ac- celerate application development. Python offers highly optimized libraries from an incredibly large developer community, yet is limited to the performance of the hardware system. In this work we evaluate the impact of using PYNQ, a Python development environment for application development on the Xilinx Zynq devices, the performance implications, and bottlenecks associated with it. We compare our results against existing C-based and hand-coded implementations to better understand if Python can be the glue that binds together software and hardware developers.Comment: To appear in 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM'17

    Microprocessor and FPGA interfaces for in-system co-debugging in field programmable hybrid systems

    Get PDF
    Modern trends in technology require efficient control and processing platforms based on connected software-hardware subsystems. Due to their complexity and size, algorithms implemented on these platforms are difficult to test and verify. When these types of solution are being designed, it is necessary to provide information of the internal values of registers and memories of both the software and hardware during the execution of the complete system. The final architecture of the targeted design and its debugging capabilities strongly depends on how the hybrid system is connected and clocked. This article discusses different architectural strategies that have been adopted for a hybrid hardware-software platform, built ready for debugging, and that uses components that can be easily found with a few special features. All the solutions have been implemented and evaluated using the UNSHADES-2 framework

    GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs

    Full text link
    In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.Comment: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320

    Document Classification Systems in Heterogeneous Computing Environments

    Get PDF
    Datacenter workloads demand high throughput, low cost and power efficient solutions. In most data centers the operating costs dominates the infrastructure cost. The ever growing amounts of data and the critical need for higher throughput, more energy efficient document classification solutions motivated us to investigate alternatives to the traditional homogeneous CPU based implementations of document classification systems. Several heterogeneous systems were investigated in the past where CPUs were combined with GPUs and FPGAs as system accelerators. The increasing complexity of FPGAs made them an interesting device in the heterogeneous computing environments and on the other hand difficult to program using Hardware Description languages. We explore the trade-offs when using high level synthesis and low level synthesis when programming FPGAs. Using low level synthesis results in less hardware resource usage on FPGAs and also offers the higher throughput compared to using HLS tool. While using HLS tool different heterogeneous computing devices such as multicore CPU and GPU targeted. Through our implementation experience and empirical results for data centric applications, we conclude that we can achieve power efficient results for these set of applications by either using low level synthesis or high level synthesis for programming FPGAs

    The development of a node for a hardware reconfigurable parallel processor

    Get PDF
    This dissertation concerns the design and implementation of a node for a hardware reconfigurable parallel processor. The hardware that was developed allows for the further development of a parallel processor with configurable hardware acceleration. Each node in the system has a standard microprocessor and reconfigurable logic device and has high speed communications channels for inter-node communication. The design of the node provided high-speed serial communications channels allowing the implementation of various network topographies. The node also provided a PCI master interface to provide an external interface and communicate with local nodes on the bus. A high speed RlSC processor provided communication and system control functions and the reconfigurable logic device provided communication interfaces and data processing functions. The node was designed and implemented as a PCI card that interfaced a standard PCI bus. VHDL designs for logic devices that provided system support were developed, VHDL designs for the reconfigurable logic FPGA and software including drivers and system software were written for the node. The 64-bit version Linux operating system was then ported to the processor providing a UNIX environment for the system. The node functioned as specified and parallel and hardware accelerated processing was demonstrated. The hardware acceleration was shown to provide substantial performance benefits for the system
    corecore