29,524 research outputs found
PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations
We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable
multi-purpose computer for many-body simulations. The main difference between
PROGRAPE-1 and "traditional" GRAPE systems is that the former uses FPGA (Field
Programmable Gate Array) chips as the processing elements, while the latter
rely on the hardwired pipeline processor specialized to gravitational
interactions. Since the logic implemented in FPGA chips can be reconfigured, we
can use PROGRAPE-1 to calculate not only gravitational interactions but also
other forms of interactions such as van der Waals force, hydrodynamical
interactions in SPH calculation and so on. PROGRAPE-1 comprises two Altera
EPF10K100 FPGA chips, each of which contains nominally 100,000 gates. To
evaluate the programmability and performance of PROGRAPE-1, we implemented a
pipeline for gravitational interaction similar to that of GRAPE-3. One pipeline
fitted into a single FPGA chip, which operated at 16 MHz clock. Thus, for
gravitational interaction, PROGRAPE-1 provided the speed of 0.96
Gflops-equivalent. PROGRAPE will prove to be useful for wide-range of
particle-based simulations in which the calculation cost of interactions other
than gravity is high, such as the evaluation of SPH interactions.Comment: 20 pages with 9 figures; submitted to PAS
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Empowering parallel computing with field programmable gate arrays
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements
Evaluation of Single-Chip, Real-Time Tomographic Data Processing on FPGA - SoC Devices
A novel approach to tomographic data processing has been developed and
evaluated using the Jagiellonian PET (J-PET) scanner as an example. We propose
a system in which there is no need for powerful, local to the scanner
processing facility, capable to reconstruct images on the fly. Instead we
introduce a Field Programmable Gate Array (FPGA) System-on-Chip (SoC) platform
connected directly to data streams coming from the scanner, which can perform
event building, filtering, coincidence search and Region-Of-Response (ROR)
reconstruction by the programmable logic and visualization by the integrated
processors. The platform significantly reduces data volume converting raw data
to a list-mode representation, while generating visualization on the fly.Comment: IEEE Transactions on Medical Imaging, 17 May 201
- …