454 research outputs found
Rapid codesign of a soft vector processor and its compiler
Despite a decade of activity in the development of
soft vector processors for FPGAs, high-level language support
remains thin. We attribute this problem to a design method in
which the high-level vector programming interface is only really
considered once the processor architecture has been perfected,
by which point the designer may be committed to the timeconsuming
development of a complicated compiler. In this paper,
we present the codesign of a soft vector processor and a
lightweight compiler, which together lift the level of abstraction
for the programmer while allowing a rapid compiler implementation
phase.We demonstrate the effectiveness of our approach on a
range of applications from digital signal processing, neuroscience,
and machine learning.This work is sponsored by EPSRC grant EP/G015783/1.This is the accepted manuscript version. The final version is available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6927425&tag=1. © IEEE 201
A C++-embedded Domain-Specific Language for programming the MORA soft processor array
MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance
High throughput accelerator interface framework for a linear time-multiplexed FPGA overlay
Coarse-grained FPGA overlays improve design productivity through software-like programmability and fast compilation. However, the effectiveness of overlays as accelerators is dependent on suitable interface and programming integration into a typically processor-based computing system, an aspect which has often been neglected in evaluations of overlays. We explore the integration of a time-multiplexed FPGA overlay over a server-class PCI Express interface. We show how this integration can be optimised to maximise performance, and evaluate the area overhead. We also propose a user-friendly programming model for such an overlay accelerator system
An Early-Stage Statement-Level Metric for Energy Characterization of Embedded Processors
Abstract This work presents an early stage statement-level metric for energy characterization of embedded processors. Definition and the framework for metric evaluation are provided. In particular, such a metric is based on an existing assembly-level analysis and some profiling activities performed on a given C benchmark, and it is related to the average energy consumption of a generic C statement, for a given target processor. Its evaluation is performed with a one-time effort and, once available, it can be used to rapidly estimate the energy consumption of a given C function for all the considered processors. Two reference embedded processors are then considered in order to show an example of usage of the proposed metric and framework
Fast Simulation of Programmable Network Forwarding Plane Devices
With the evolution of the Internet, the processing of packets at the routers while providing
flexibility in deploying new protocols and services at the same time has become a major concern.
Programmable forwarding elements with high processing capability have emerged as a solution.
But the main challenge is to find the optimal hardware architecture while taking into account
constraints such as different packet processing functions, task scheduling options, electrical
power consumption and providing quality-of-service (QoS) guarantees. Therefore, it is essential
to investigate methods that help in identifying limitations and bottlenecks before physical
fabrication. Having an appropriate model provides designers a progressive path to narrow the
design space and establish credible and feasible alternatives before deciding on an
implementation.
In this thesis, we propose a flexible and fast instruction accurate host-compiled simulator to
make it possible to explore wide ranges of architectures and application scenarios to find the
optimal configuration that meets given performance, throughput and latency for programmable
forwarding elements. Application developers can use the simulator as a virtual prototype to
simulate and debug their applications before hardware availability. Moreover, forwarding device
architects can use simulator to evaluate the trade-offs between different hardware/software
design decisions
- …