1,777 research outputs found
Minimum entropy restoration using FPGAs and high-level techniques
One of the greatest perceived barriers to the widespread use of FPGAs in image processing is the difficulty for application specialists of developing algorithms on reconfigurable hardware. Minimum entropy deconvolution (MED) techniques have been shown to be effective in the restoration of star-field images. This paper reports on an attempt to implement a MED algorithm using simulated annealing, first on a microprocessor, then on an FPGA. The FPGA implementation uses DIME-C, a C-to-gates compiler, coupled with a low-level core library to simplify the design task. Analysis of the C code and output from the DIME-C compiler guided the code optimisation. The paper reports on the design effort that this entailed and the resultant performance improvements
GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs
In recent years, architectures combining a reconfigurable fabric and a
general purpose processor on a single chip became increasingly popular. Such
hybrid architectures allow extending embedded software with application
specific hardware accelerators to improve performance and/or energy efficiency.
Aiding system designers and programmers at handling the complexity of the
required process of hardware/software (HW/SW) partitioning is an important
issue. Current methods are often restricted, either to bare-metal systems, to
subsets of mainstream programming languages, or require special coding
guidelines, e.g., via annotations. These restrictions still represent a high
entry barrier for the wider community of programmers that new hybrid
architectures are intended for. In this paper we revisit HW/SW partitioning and
present a seamless programming flow for unrestricted, legacy C code. It
consists of a retargetable GCC plugin that automatically identifies code
sections for hardware acceleration and generates code accordingly. The proposed
workflow was evaluated on the Xilinx Zynq platform using unmodified code from
an embedded benchmark suite.Comment: Presented at Second International Workshop on FPGAs for Software
Programmers (FSP 2015) (arXiv:1508.06320
Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs
In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain
An empirical evaluation of High-Level Synthesis languages and tools for database acceleration
High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft
Packet Transactions: High-level Programming for Line-Rate Switches
Many algorithms for congestion control, scheduling, network measurement,
active queue management, security, and load balancing require custom processing
of packets as they traverse the data plane of a network switch. To run at line
rate, these data-plane algorithms must be in hardware. With today's switch
hardware, algorithms cannot be changed, nor new algorithms installed, after a
switch has been built.
This paper shows how to program data-plane algorithms in a high-level
language and compile those programs into low-level microcode that can run on
emerging programmable line-rate switching chipsets. The key challenge is that
these algorithms create and modify algorithmic state. The key idea to achieve
line-rate programmability for stateful algorithms is the notion of a packet
transaction : a sequential code block that is atomic and isolated from other
such code blocks. We have developed this idea in Domino, a C-like imperative
language to express data-plane algorithms. We show with many examples that
Domino provides a convenient and natural way to express sophisticated
data-plane algorithms, and show that these algorithms can be run at line rate
with modest estimated die-area overhead.Comment: 16 page
P4CEP: Towards In-Network Complex Event Processing
In-network computing using programmable networking hardware is a strong trend
in networking that promises to reduce latency and consumption of server
resources through offloading to network elements (programmable switches and
smart NICs). In particular, the data plane programming language P4 together
with powerful P4 networking hardware has spawned projects offloading services
into the network, e.g., consensus services or caching services. In this paper,
we present a novel case for in-network computing, namely, Complex Event
Processing (CEP). CEP processes streams of basic events, e.g., stemming from
networked sensors, into meaningful complex events. Traditionally, CEP
processing has been performed on servers or overlay networks. However, we argue
in this paper that CEP is a good candidate for in-network computing along the
communication path avoiding detouring streams to distant servers to minimize
communication latency while also exploiting processing capabilities of novel
networking hardware. We show that it is feasible to express CEP operations in
P4 and also present a tool to compile CEP operations, formulated in our P4CEP
rule specification language, to P4 code. Moreover, we identify challenges and
problems that we have encountered to show future research directions for
implementing full-fledged in-network CEP systems.Comment: 6 pages. Author's versio
- …