2 research outputs found

    FUSE: Front-End User Framework for O/S Abstraction of Hardware Accelerators

    Full text link
    Abstract—SoCs can be implemented on a single FPGA, offering designers a unique opportunity for Embedded Sys-tems. Instead of defining a fixed architecture early in the design process, the reconfigurable platform allows architec-tural redesign to meet the system’s specific needs. However, the ability to instantiate new modules in the reconfigurable hardware provides a unique set of challenges for integration, particularly to the software (SW) designer. Specifically, the Operating System (OS) cannot automatically abstract these platform changes without redesign. In this paper, we present FUSE, a framework for HW accelerator abstraction that provides: 1) transparency to the SW designer at the application level; and 2) OS support for easy HW accelerator integration. We illustrate FUSE as an API for an embedded Linux OS with POSIX threads on Xilinx’s MicroBlaze on a Virtex5. For three different applications and HW accelerators, we achieve performance speedups ranging from 6.4-37x. I

    High-performance architectures for accelerating sparse LU computation

    Get PDF
    Sparse Lower-Upper (LU) Triangular Decomposition is important to many di erent applications, including power system analysis. High-performance sparse linear algebra software packages, executing on general-purpose processors, experience lower performance when processing power system matrices. This observation motivated previous work on the design of custom hardware, implemented, in FPGA, to improve performance of sparse LU. While improved performance was obtained, signi cant e ort was required to design and implement the hardware. This thesis investigates the combination of general purpose architectures and a hardware accelerator, for a crucial component of sparse LU, to achieve similar performance results without the design overhead. One architecture, combining a general-purpose processor with a hardware accelerator, achieves a 1.29X speedup over software for a 26K-Bus power system. The second architecture, a modi cation of the Data Pump Architecture, provides a 2.27X speedup over software on the 26K-bus power system. These results show that speedup for sparse LU is possible, without designing a complete custom hardware solution, using a small hardware accelerator, provided a tightly coupled architecture is available to feed data to the accelerator.M.S., Computer Engineering -- Drexel University, 201
    corecore