12,949 research outputs found

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification

    Full text link
    Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and contain specialized semi-programmable accelerators in addition to programmable processors. In contrast to the pre-accelerator era, when the ISA played an important role in verification by enabling a clean separation of concerns between software and hardware, verification of these "accelerator-rich" SoCs presents new challenges. From the perspective of hardware designers, there is a lack of a common framework for the formal functional specification of accelerator behavior. From the perspective of software developers, there exists no unified framework for reasoning about software/hardware interactions of programs that interact with accelerators. This paper addresses these challenges by providing a formal specification and high-level abstraction for accelerator functional behavior. It formalizes the concept of an Instruction Level Abstraction (ILA), developed informally in our previous work, and shows its application in modeling and verification of accelerators. This formal ILA extends the familiar notion of instructions to accelerators and provides a uniform, modular, and hierarchical abstraction for modeling software-visible behavior of both accelerators and programmable processors. We demonstrate the applicability of the ILA through several case studies of accelerators (for image processing, machine learning, and cryptography), and a general-purpose processor (RISC-V). We show how the ILA model facilitates equivalence checking between two ILAs, and between an ILA and its hardware finite-state machine (FSM) implementation. Further, this equivalence checking supports accelerator upgrades using the notion of ILA compatibility, similar to processor upgrades using ISA compatibility.Comment: 24 pages, 3 figures, 3 table

    Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device

    Get PDF
    Currently, most designers face a daunting task to research different design flows and learn the intricacies of specific software from various manufacturers in hardware/software co-design. An urgent need of creating a scalable hardware/software co-design platform has become a key strategic element for developing hardware/software integrated systems. In this paper, we propose a new design flow for building a scalable co-design platform on FPGA-based system-on-chip. We employ an integrated approach to implement a histogram oriented gradients (HOG) and a support vector machine (SVM) classification on a programmable device for pedestrian tracking. Not only was hardware resource analysis reported, but the precision and success rates of pedestrian tracking on nine open access image data sets are also analysed. Finally, our proposed design flow can be used for any real-time image processingrelated products on programmable ZYNQ-based embedded systems, which benefits from a reduced design time and provide a scalable solution for embedded image processing products

    Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs

    Full text link
    As modern FPGAs evolve to include more het- erogeneous processing elements, such as ARM cores, it makes sense to consider these devices as processors first and FPGA accelerators second. As such, the conventional FPGA develop- ment environment must also adapt to support more software- like programming functionality. While high-level synthesis tools can help reduce FPGA development time, there still remains a large expertise gap in order to realize highly performing implementations. At a system-level the skill set necessary to integrate multiple custom IP hardware cores, interconnects, memory interfaces, and now heterogeneous processing elements is complex. Rather than drive FPGA development from the hardware up, we consider the impact of leveraging Python to ac- celerate application development. Python offers highly optimized libraries from an incredibly large developer community, yet is limited to the performance of the hardware system. In this work we evaluate the impact of using PYNQ, a Python development environment for application development on the Xilinx Zynq devices, the performance implications, and bottlenecks associated with it. We compare our results against existing C-based and hand-coded implementations to better understand if Python can be the glue that binds together software and hardware developers.Comment: To appear in 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM'17
    corecore