Search CORE

279 research outputs found

FASTCUDA: Open Source FPGA Accelerator &amp; Hardware-Software Codesign Toolset for CUDA Kernels

Author: de la Torre E.()
Lavagno L.()
Lazarescu M.()
Mavroidis I. ()
Papaefstathiou I.()
Papaefstathiou Ioannis(http://users.isc.tuc.gr/~ipapaefstathiou)
Schafer F.()
Παπαευσταθιου Ιωαννης(http://users.isc.tuc.gr/~ipapaefstathiou)
Publication venue: IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: [email protected], INTERNET: http://www.ieee.org, Fax: (732)981-9667
Publication date: 01/01/2012
Field of study

Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Institutional Repository of the Technical University of Crete

A toolset for the analysis and optimization of motion estimation algorithms and processors

Author: Nunez-Yanez JL
Spiteri T
Vafiadis G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2009
Field of study

Crossref

Explore Bristol Research

Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise: Designing a Computer Architecture via HLS)

Author: Giorgi Roberto
KHALILI MAYBODI Farnam
Procaccini Marco
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2019
Field of study

Translating a system requirement into a low-level representation (e.g., register transfer level or RTL) is the typical goal of the design of FPGA-based systems. However, the Design Space Exploration (DSE) needed to identify the final architecture may be time consuming, even when using high-level synthesis (HLS) tools. In this article, we illustrate our hybrid methodology, which uses a frontend for HLS so that the DSE is performed more rapidly by using a higher level abstraction, but without losing accuracy, thanks to the HP-Labs COTSon simulation infrastructure in combination with our DSE tools (MYDSE tools). In particular, this proposed methodology proved useful to achieve an appropriate design of a whole system in a shorter time than trying to design everything directly in HLS. Our motivating problem was to deploy a novel execution model called data-flow threads (DF-Threads) running on yet-to-be-designed hardware. For that goal, directly using the HLS was too premature in the design cycle. Therefore, a key point of our methodology consists in defining the first prototype in our simulation framework and gradually migrating the design into the Xilinx HLS after validating the key performance metrics of our novel system in the simulator. To explain this workflow, we first use a simple driving example consisting in the modelling of a two-way associative cache. Then, we explain how we generalized this methodology and describe the types of results that we were able to analyze in the AXIOM project, which helped us reduce the development time from months/weeks to days/hours

Archivio della Ricerca - Università degli Studi di Siena

LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

Author: Alvarez Carlos
Bautista Leonardo
Becker Tobias
Billung-Meyer Gunnar
Carpenter Paul
Christmann Wolfgang
Cristal Adrian
De La Cruz Raul
Dubhashi Devdatt
Etsion Yoav
Felber Pascal
Fetzer Christof
Gaydadjiev Georgi
Göttel Christian
Hadar Elad
Hagemeyer Jens
Jimenez Daniel
Jungeblut Thorsten
Kaiser Martin
Klawonn Frank
Krupop Stefan
Kucza Nils
Madonar Sergi
Martorell Xavier
Mihklafi Amani
Mudge Trevor
Mudge Trevor
Pasin Marcelo
Pericàs Miquel
Pnevmatikatos Dionisios N.
Porrmann Mario
Port Oron
Rocha Isabelly
Salami Behzad
Salomonsson Hans
Schiavoni Valerio
Trancoso Pedro
Unsal Osman S.
vor dem Berge Micha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Chalmers Research

Publications at Bielefeld University

Smart technologies for effective reconfiguration: the FASTER approach

Author: Becker Tobias
Bonetto A
Cazzaniga A
Davidson Tom
Durelli GC
Gaydadjiev Georgi
Luk Wayne
Papadimitriou Kyprianos
Pilato Christiano
Pnevmatikatos Dionisios
Santambrogio Marco D
Sciuto Donatella
Stroobandt Dirk
Todman Tim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Current and future computing systems increasingly require that their functionality stays flexible after the system is operational, in order to cope with changing user requirements and improvements in system features, i.e. changing protocols and data-coding standards, evolving demands for support of different user applications, and newly emerging applications in communication, computing and consumer electronics. Therefore, extending the functionality and the lifetime of products requires the addition of new functionality to track and satisfy the customers needs and market and technology trends. Many contemporary products along with the software part incorporate hardware accelerators for reasons of performance and power efficiency. While adaptivity of software is straightforward, adaptation of the hardware to changing requirements constitutes a challenging problem requiring delicate solutions. The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) project aims at introducing a complete methodology to allow designers to easily implement a system specification on a platform which includes a general purpose processor combined with multiple accelerators running on an FPGA, taking as input a high-level description and fully exploiting, both at design time and at run time, the capabilities of partial dynamic reconfiguration. The goal is that for selected application domains, the FASTER toolchain will be able to reduce the design and verification time of complex reconfigurable systems providing additional novel verification features that are not available in existing tool flows

Ghent University Academic Bibliography

Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS)

Author: Farnam Khalili
Farnam Khalili
Marco Procaccini
Roberto Giorgi
Publication venue
Publication date: 03/11/2019
Field of study

Open Access Repository

Exploiting run-time reconfigurable hardware in the development of automatic fingerprint-based personal recognition applications

Author: Francisco Fons
Mariano Fons
Publication venue: 'IntechOpen'
Publication date: 27/07/2011
Field of study

IntechOpen

Can my chip behave like my brain?

Author: George Suma
Publication venue: Georgia Institute of Technology
Publication date: 27/05/2016
Field of study

Many decades ago, Carver Mead established the foundations of neuromorphic systems. Neuromorphic systems are analog circuits that emulate biology. These circuits utilize subthreshold dynamics of CMOS transistors to mimic the behavior of neurons. The objective is to not only simulate the human brain, but also to build useful applications using these bio-inspired circuits for ultra low power speech processing, image processing, and robotics. This can be achieved using reconfigurable hardware, like field programmable analog arrays (FPAAs), which enable configuring different applications on a cross platform system. As digital systems saturate in terms of power efficiency, this alternate approach has the potential to improve computational efficiency by approximately eight orders of magnitude. These systems, which include analog, digital, and neuromorphic elements combine to result in a very powerful reconfigurable processing machine.Ph.D

Scholarly Materials And Research @ Georgia Tech

The hArtes Tool Chain

Author: A. Antola
A. Cerruto
A. Lattanzi
A. Michelotti
A. Morea
C. Pilato
D. Sciuto
E. Ciavattini
F. Bettarelli
F. Ferrandi
J.G.F. Coutinho
K. Bertels
K. Sigdel
M. Lattuada
M.T. Chiaradia
R. Nutricato
R.J. Meeuws
T. Todman
V.M. Sima
W. Luk
Y. Yankova
Y.M. Lam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

Archivio istituzionale della ricerca - Politecnico di Milano