11,665 research outputs found
Data Processing with FPGAs on Modern Architectures
Trends in hardware, the prevalence of the cloud, and the rise of highly
demanding applications have ushered an era of specialization that quickly
changes how data is processed at scale. These changes are likely to continue
and accelerate in the next years as new technologies are adopted and deployed:
smart NICs, smart storage, smart memory, disaggregated storage, disaggregated
memory, specialized accelerators (GPUS, TPUs, FPGAs), and a wealth of ASICs
specifically created to deal with computationally expensive tasks (e.g.,
cryptography or compression). In this tutorial, we focus on data processing on
FPGAs, a technology that has received less attention than, e.g., TPUs or GPUs
but that is, however, increasingly being deployed in the cloud for data
processing tasks due to the architectural flexibility of FPGAs, along with
their ability to process data at line rate, something not possible with other
types of processors or accelerators.
In the tutorial, we will cover what FPGAs are, their characteristics, their
advantages and disadvantages, as well as examples from deployments in the
industry and how they are used in various data processing tasks. We will
introduce FPGA programming with high-level languages and describe hardware and
software resources available to researchers. The tutorial includes case studies
borrowed from research done in collaboration with companies that illustrate the
potential of FPGAs in data processing and how software and hardware are
evolving to take advantage of the possibilities offered by FPGAs. The use cases
include: (1) approximated nearest neighbor search, which is relevant to
databases and machine learning, (2) remote disaggregated memory, showing how
the cloud architecture is evolving and demonstrating the potential for operator
offloading and line rate data processing, and (3) recommendation system as an
application with tight latency constraints
Empowering parallel computing with field programmable gate arrays
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements
Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
Recent technological advances have proliferated the available computing
power, memory, and speed of modern Central Processing Units (CPUs), Graphics
Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs).
Consequently, the performance and complexity of Artificial Neural Networks
(ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs)
currently offer state-of-the-art performance, they consume large amounts of
power. Training such networks on CPUs is inefficient, as data throughput and
parallel computation is limited. FPGAs are considered a suitable candidate for
performance critical, low power systems, e.g. the Internet of Things (IOT) edge
devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development
environment, networks described using the high-level OpenCL framework can be
accelerated on heterogeneous platforms. Moreover, the resource utilization and
power consumption of DNNs can be further enhanced by utilizing regularization
techniques that binarize network weights. In this paper, we introduce, to the
best of our knowledge, the first FPGA-accelerated stochastically binarized DNN
implementations, and compare them to implementations accelerated using both
GPUs and FPGAs. Our developed networks are trained and benchmarked using the
popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art
performance, while offering a >16-fold improvement in power consumption,
compared to conventional GPU-accelerated networks. Both our FPGA-accelerated
determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10
by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl
FPGA-based operational concept and payload data processing for the Flying Laptop satellite
Flying Laptop is the first small satellite developed by the Institute of Space Systems at the Universität Stuttgart. It is a test bed for an on-board computer with a reconfigurable, redundant and self-controlling high computational ability based on the field pro- grammable gate arrays (FPGAs). This Technical Note presents the operational concept and the on-board payload data processing of the satellite. The designed operational concept of Flying Laptop enables the achievement of mission goals such as technical demonstration, scientific Earth observation, and the payload data processing methods. All these capabilities expand its scientific usage and enable new possibilities for real-time applications. Its hierarchical architecture of the operational modes of subsys- tems and modules are developed in a state-machine diagram and tested by means of MathWorks Simulink-/Stateflow Toolbox. Furthermore, the concept of the on-board payload data processing and its implementation and possible applications are described
- …