Search CORE

555 research outputs found

Convey HC-1 Hybrid Core Computer - The Potential of FPGAs in Numerical Simulation

Author: Austin Werner
Heuveline Vincent
Weiss Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2010
Field of study

A cross-platform OpenVX library for FPGA accelerators

Author: Davila-Guzman M. A.
Gran Tejero R.
Göhringer D.
Kalms L.
Suarez Gracia D.
Villarroya-Gaudo M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

FPGAs are an excellent platform to implement computer vision applications, since these applications tend to offer a high level of parallelism with many data-independent operations. However, the freedom in the solution design space of FPGAs represents a problem because each solution must be individually designed, verified, and tuned. The emergence of High Level Synthesis (HLS) helps solving this problem and has allowed the implementation of open programming standards as OpenVX for computer vision applications on FPGAs, such as the HiF1ipVX library developed exclusively for Xilinx devices. Although with the HiF1ipVX library, designers can develop solutions efficiently on Xilinx, they do not have an approach to port and run their code on FPGAs from other manufacturers. This work extends the HiFlipVX capabilities in two significant ways: supporting Intel FPGA devices and enabling execution on discrete FPGA accelerators. To provide both without affecting user-facing code, the new carried out implementation combines two HLS programming models: C++, using Intel''s system of tasks, and OpenCL, which provides the CPU interoperability. Comparing with pure OpenCL implementations, this work reduces kernel dispatch resources, saving up to 24% of ALUT resources for each kernel in a graph, and improves performance 2.6 x and energy consumption 1.6 x on average for a set of representative applications, compared with state-of-the-art frameworks

Repositorio Universidad de Zaragoza

Adaptation of High Performance and High Capacity Reconfigurable Systems to OpenCL Programming Environments

Author: Russo Davide
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 18/09/2020
Field of study

[EN] In this work, we adapt a reconfigurable computer system based on FPGA technologies to OpenCL programming environments. The reconfigurable system is part of a compute prototype of the MANGO European project that includes 96 FPGAs. To optimize the use and to obtain its maximum performance, it is essential to adapt it to heterogeneous systems programming environments such as OpenCL, which simplifies its programming. In this work, all the necessary activities for correct implementation of the software and hardware layer required for its use in OpenCL will be carried out, as well as an evaluation of the performance obtained and the flexibility offered by the solution provided. This work has been performed during an internship of 5 months. The internship is linked to an agreement between UPV and UniNa (Università degli Studi di Napoli Federico II).[ES] En este trabajo se va a realizar la adaptación de un sistema reconfigurable de cómputo basado en tecnologías de FPGAs hacia entornos de programación en OpenCL. El sistema reconfigurable forma parte de un prototipo de cálculo del proyecto Europeo MANGO que incluye 96 FPGAs. Con el fin de optimizar el uso y de obtener sus máximas prestaciones, se hace imprescindible una adaptación a entornos de programación de sistemas heterogéneos como OpenCL, lo cual simplifica su programación y uso. En este trabajo se realizarán todas las actividades necesarias para una correcta implementación de la capa software y hardware necesaria para su uso en OpenCL así como una evaluación de las prestaciones obtenidas y de la flexibilidad ofrecida por la solución aportada. Este trabajo se ha llevado a término durante una estancia de cinco meses en la Universitat Politécnica de Valéncia. Esta estancia está vinculada a un acuerdo entre la Universitat Politécnica de Valéncia y la Università degli Studi di Napoli Federico IIRusso, D. (2020). Adaptation of High Performance and High Capacity Reconfigurable Systems to OpenCL Programming Environments. http://hdl.handle.net/10251/150393TFG

RiuNet

Blocks:Challenging SIMDs and VLIWs With a Reconfigurable Architecture

Author: Corporaal H.
Kumar A.
Wijtvliet M.
Publication venue
Publication date: 01/09/2022
Field of study

Demand for coarse grain reconfigurable architectures (CGRAs) has significantly increased in recent years as architectures need to be both energy efficient and flexible. However, most CGRAs are optimized for performance instead of energy efficiency. In this work, a novel paradigm for reconfigurable architectures, Blocks, is presented. Blocks uses two separate circuit-switched networks, one for control and one for the data path. This enables the runtime construction of energy-efficient application-specific VLIW-SIMD processors on a reconfigurable fabric. Its energy efficiency is demonstrated by comparing Blocks to four reference architectures, a VLIW, an SIMD, a commercial low-power microprocessor, and a traditional CGRA. All comparisons are based on commercial low-power 40-nm CMOS layout, including memories. Results show that Blocks can achieve a mean total energy reduction of 2.05 × , 1.84 × , 8.01 × , and 1.22 × over a VLIW, an SIMD, an energy-efficient microprocessor and a traditional CGRA, respectively. At the same time, Blocks delivers equal or higher performance per area due to its ability to adapt to applications by reconfiguration.</p

Pure OAI Repository

Dynamic memory management exploration σε συστήματα πολλών accelerators με χρήση Vivado HLS

Author: Davourli Angeliki
Δαβουρλή Αγγελική
Publication venue
Publication date: 29/06/2016
Field of study

DSpace at NTUA

Lightweight asynchronous scheduling in heterogeneous reconfigurable systems

Author: Asenjo R.
Gran Tejero R.
Navarro A.
Nikov K.
Nunez-Yanez J.
Rodríguez A.
Suárez Gracia D.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance. Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity. © 2022 The Author

Repositorio Universidad de Zaragoza

Explore Bristol Research

An Exploration of FPGAs as Accelerators for Graph Analysis via High-Level Synthesis

Author: Pedro Filipe Vilhena de Campos Oliveira e Silva
Publication venue
Publication date: 14/01/2021
Field of study

Repositório Aberto da Universidade do Porto

Recommended from our members

AN ARCHITECTURE EVALUATION AND IMPLEMENTATION OF A SOFT GPGPU FOR FPGAs

Author: Andryc Kevin
Publication venue: ScholarWorks@UMass Amherst
Publication date: 25/10/2018
Field of study

Embedded and mobile systems must be able to execute a variety of different types of code, often with minimal available hardware. Many embedded systems now come with a simple processor and an FPGA, but not more energy-hungry components, such as a GPGPU. In this dissertation we present FlexGrip, a soft architecture which allows for the execution of GPGPU code on an FPGA without the need to recompile the design. The architecture is optimized for FPGA implementation to effectively support the conditional and thread-based execution characteristics of GPGPU execution without FPGA design recompilation. This architecture supports direct CUDA compilation to a binary which is executable on the FPGA-based GPGPU. Our architecture is customizable, thus providing the FPGA designer with a selection of GPGPU cores which display performance versus area tradeoffs. This dissertation describes the FlexGrip architecture in detail and showcases the benefits by evaluating the design for a collection of five standard CUDA benchmarks which are compiled using standard GPGPU compilation tools. Speedups of 23x, on average, versus a MicroBlaze microprocessor are achieved for designs which take advantage of the conditional execution capabilities offered by FlexGrip. We also show FlexGrip can achieve an 80% average reduction of dynamic energy versus the MicroBlaze microprocessor. The dissertation furthers discussion by exploring application-customized versions of the soft GPGPU, thus exploiting the overlay architecture. We expand the architecture to multiple processors per GPGPU and optimizing away features which are not needed by certain classes of applications. These optimizations, which include the effective use of block RAMs and DSP blocks, are critical to the performance of FlexGrip. By implementing a 2 GPGPU design, we show speedups of 44x on average versus a MicroBlaze microprocessor. Application-customized versions of the soft GPGPU can be used to further reduce dynamic energy consumption by an average of 14%. To complete this thesis, we augmented a GPGPU cycle accurate simulator to emulate FlexGrip and evaluate different levels of cache design spaces. We show performance increases for select benchmarks, however, we also show that 64% and 45% of benchmarks exhibited performance decreases when L1D cache was enabled for the 1 SMP and 2 SMP configurations, and only one benchmark showed performance improvement when the L2 cache was enabled

ScholarWorks@UMass Amherst