168 research outputs found
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
CPU-FPGA heterogeneous architectures are attracting ever-increasing attention
in an attempt to advance computational capabilities and energy efficiency in
today's datacenters. These architectures provide programmers with the ability
to reprogram the FPGAs for flexible acceleration of many workloads.
Nonetheless, this advantage is often overshadowed by the poor programmability
of FPGAs whose programming is conventionally a RTL design practice. Although
recent advances in high-level synthesis (HLS) significantly improve the FPGA
programmability, it still leaves programmers facing the challenge of
identifying the optimal design configuration in a tremendous design space.
This paper aims to address this challenge and pave the path from software
programs towards high-quality FPGA accelerators. Specifically, we first propose
the composable, parallel and pipeline (CPP) microarchitecture as a template of
accelerator designs. Such a well-defined template is able to support efficient
accelerator designs for a broad class of computation kernels, and more
importantly, drastically reduce the design space. Also, we introduce an
analytical model to capture the performance and resource trade-offs among
different design configurations of the CPP microarchitecture, which lays the
foundation for fast design space exploration. On top of the CPP
microarchitecture and its analytical model, we develop the AutoAccel framework
to make the entire accelerator generation automated. AutoAccel accepts a
software program as an input and performs a series of code transformations
based on the result of the analytical-model-based design space exploration to
construct the desired CPP microarchitecture. Our experiments show that the
AutoAccel-generated accelerators outperform their corresponding software
implementations by an average of 72x for a broad class of computation kernels
Low Power Implementation of Non Power-of-Two FFTs on Coarse-Grain Reconfigurable Architectures
The DRM standard for digital radio broadcast in the AM band requires integrated devices for radio receivers at very low power. A System on Chip (SoC) call DiMITRI was developed based on a dual ARM9 RISC core architecture. Analyses showed that most computation power is used in the Coded Orthogonal Frequency Division Multiplexing (COFDM) demodulation to compute Fast Fourier Transforms (FFT) and inverse transforms (IFFT) on complex samples. These FFTs have to be computed on non power-of-two numbers of samples, which is very uncommon in the signal processing world. The results obtained with this chip, lead to the objective to decrease the power dissipated by the COFDM demodulation part using a coarse-grain reconfigurable structure as a coprocessor. This paper introduces two different coarse-grain architectures: PACT XPP technology and the Montium, developed by the University of Twente, and presents the implementation of a\ud
Fast Fourier Transform on 1920 complex samples. The implementation result on the Montium shows a saving of a factor 35 in terms of processing time, and 14 in terms of power consumption compared to the RISC implementation, and a\ud
smaller area. Then, as a conclusion, the paper presents the next steps of the development and some development issues
Data Processing with FPGAs on Modern Architectures
Trends in hardware, the prevalence of the cloud, and the rise of highly
demanding applications have ushered an era of specialization that quickly
changes how data is processed at scale. These changes are likely to continue
and accelerate in the next years as new technologies are adopted and deployed:
smart NICs, smart storage, smart memory, disaggregated storage, disaggregated
memory, specialized accelerators (GPUS, TPUs, FPGAs), and a wealth of ASICs
specifically created to deal with computationally expensive tasks (e.g.,
cryptography or compression). In this tutorial, we focus on data processing on
FPGAs, a technology that has received less attention than, e.g., TPUs or GPUs
but that is, however, increasingly being deployed in the cloud for data
processing tasks due to the architectural flexibility of FPGAs, along with
their ability to process data at line rate, something not possible with other
types of processors or accelerators.
In the tutorial, we will cover what FPGAs are, their characteristics, their
advantages and disadvantages, as well as examples from deployments in the
industry and how they are used in various data processing tasks. We will
introduce FPGA programming with high-level languages and describe hardware and
software resources available to researchers. The tutorial includes case studies
borrowed from research done in collaboration with companies that illustrate the
potential of FPGAs in data processing and how software and hardware are
evolving to take advantage of the possibilities offered by FPGAs. The use cases
include: (1) approximated nearest neighbor search, which is relevant to
databases and machine learning, (2) remote disaggregated memory, showing how
the cloud architecture is evolving and demonstrating the potential for operator
offloading and line rate data processing, and (3) recommendation system as an
application with tight latency constraints
Cognitive Radio Programming: Existing Solutions and Open Issues
Software defined radio (sdr) technology has evolved rapidly and is now reaching market maturity, providing solutions for cognitive radio applications. Still, a lot of issues have yet to be studied. In this paper, we highlight the constraints imposed by recent radio protocols and we present current architectures and solutions for programming sdr. We also list the challenges to overcome in order to reach mastery of future cognitive radios systems.La radio logicielle a évolué rapidement pour atteindre la maturité nécessaire pour être mise sur le marché, offrant de nouvelles solutions pour les applications de radio cognitive. Cependant, beaucoup de problèmes restent à étudier. Dans ce papier, nous présentons les contraintes imposées par les nouveaux protocoles radios, les architectures matérielles existantes ainsi que les solutions pour les programmer. De plus, nous listons les difficultés à surmonter pour maitriser les futurs systèmes de radio cognitive
- …