454 research outputs found

    Rapid codesign of a soft vector processor and its compiler

    Get PDF
    Despite a decade of activity in the development of soft vector processors for FPGAs, high-level language support remains thin. We attribute this problem to a design method in which the high-level vector programming interface is only really considered once the processor architecture has been perfected, by which point the designer may be committed to the timeconsuming development of a complicated compiler. In this paper, we present the codesign of a soft vector processor and a lightweight compiler, which together lift the level of abstraction for the programmer while allowing a rapid compiler implementation phase.We demonstrate the effectiveness of our approach on a range of applications from digital signal processing, neuroscience, and machine learning.This work is sponsored by EPSRC grant EP/G015783/1.This is the accepted manuscript version. The final version is available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6927425&tag=1. © IEEE 201

    A C++-embedded Domain-Specific Language for programming the MORA soft processor array

    Get PDF
    MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance

    High throughput accelerator interface framework for a linear time-multiplexed FPGA overlay

    Get PDF
    Coarse-grained FPGA overlays improve design productivity through software-like programmability and fast compilation. However, the effectiveness of overlays as accelerators is dependent on suitable interface and programming integration into a typically processor-based computing system, an aspect which has often been neglected in evaluations of overlays. We explore the integration of a time-multiplexed FPGA overlay over a server-class PCI Express interface. We show how this integration can be optimised to maximise performance, and evaluate the area overhead. We also propose a user-friendly programming model for such an overlay accelerator system

    An Early-Stage Statement-Level Metric for Energy Characterization of Embedded Processors

    Get PDF
    Abstract This work presents an early stage statement-level metric for energy characterization of embedded processors. Definition and the framework for metric evaluation are provided. In particular, such a metric is based on an existing assembly-level analysis and some profiling activities performed on a given C benchmark, and it is related to the average energy consumption of a generic C statement, for a given target processor. Its evaluation is performed with a one-time effort and, once available, it can be used to rapidly estimate the energy consumption of a given C function for all the considered processors. Two reference embedded processors are then considered in order to show an example of usage of the proposed metric and framework

    Fast Simulation of Programmable Network Forwarding Plane Devices

    Get PDF
    With the evolution of the Internet, the processing of packets at the routers while providing flexibility in deploying new protocols and services at the same time has become a major concern. Programmable forwarding elements with high processing capability have emerged as a solution. But the main challenge is to find the optimal hardware architecture while taking into account constraints such as different packet processing functions, task scheduling options, electrical power consumption and providing quality-of-service (QoS) guarantees. Therefore, it is essential to investigate methods that help in identifying limitations and bottlenecks before physical fabrication. Having an appropriate model provides designers a progressive path to narrow the design space and establish credible and feasible alternatives before deciding on an implementation. In this thesis, we propose a flexible and fast instruction accurate host-compiled simulator to make it possible to explore wide ranges of architectures and application scenarios to find the optimal configuration that meets given performance, throughput and latency for programmable forwarding elements. Application developers can use the simulator as a virtual prototype to simulate and debug their applications before hardware availability. Moreover, forwarding device architects can use simulator to evaluate the trade-offs between different hardware/software design decisions
    corecore