Search CORE

71 research outputs found

High performance computing with FPGAs

Author: Beyls Kristof
D'Hollander Erik
Publication venue: 'IOS Press'
Publication date: 01/01/2009
Field of study

Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account of the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speedup computation kernels significantly with respect to traditional processors

Ghent University Academic Bibliography

An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

Author: Arcas Abella Oriol
Armejach Adrià
Cristal Kestelman Adrián
Ghasempour Mohsen
Lujan Mikel
Mawer John
Navaridas Javier
Ndu Geoffrey
Song Wei
Sönmez Nehir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

Author: Benkrid Abdsamad
Benkrid K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2009
Field of study

Portsmouth University Research Portal (Pure)

A framework for automatically generating optimized digital designs from C-language loops

Author: Holland Wesley James
Publication venue: Scholars Junction
Publication date: 03/05/2008
Field of study

Reconfigurable computing has the potential for providing significant performance increases to a number of computing applications. However, realizing these benefits requires digital design experience and knowledge of hardware description languages (HDLs). While a number of tools have focused on translation of high-level languages (HLLs) to HDLs, the tools do not always create optimized digital designs that are competitive with hand-coded solutions. This work describes an automatic optimization in the C-to-HDL transformation that reorganizes operations between pipeline stages in order to reduce critical path lengths. The effects of this optimization are examined on the MD5, SHA-1, and Smith-Waterman algorithms. Results show that the optimization results in performance gains of 13%-37% and that the automatically-generated implementations perform comparably to hand-coded implementations

Scholars Junction - Mississippi State University Institutional Repository

A framework for automatically generating optimized digital designs from C-language loops

Author: Holland Wesley James
Publication venue: Scholars Junction
Publication date: 01/05/2008
Field of study

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Run-time reconfigurable acceleration for genetic programming fitness evaluation in trading strategies

Author: Burovskiy P
Funie AI
Grigoras P
Luk W
Salmon M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/04/2017
Field of study

Genetic programming can be used to identify complex patterns in financial markets which may lead to more advanced trading strategies. However, the computationally intensive nature of genetic programming makes it difficult to apply to real world problems, particularly in real-time constrained scenarios. In this work we propose the use of Field Programmable Gate Array technology to accelerate the fitness evaluation step, one of the most computationally demanding operations in genetic programming. We propose to develop a fully-pipelined, mixed precision design using run-time reconfiguration to accelerate fitness evaluation. We show that run-time reconfiguration can reduce resource consumption by a factor of 2 compared to previous solutions on certain configurations. The proposed design is up to 22 times faster than an optimised, multithreaded software implementation while achieving comparable financial returns

Spiral - Imperial College Digital Repository

Intelligent systems engineering with reconfigurable computing

Author: Skliarova Iouliia
Publication venue
Publication date: 01/08/2006
Field of study

Intelligent computing systems comprising microprocessor cores, memory and reconfigurable user-programmable logic represent a promising technology which is well-suited for applications such as digital signal and image processing, cryptography and encryption, etc. These applications employ frequently recursive algorithms which are particularly appropriate when the underlying problem is defined in recursive terms and it is difficult to reformulate it as an iterative procedure. It is known, however, that hardware description languages (such as VHDL) as well as system-level specification languages (such as Handel-C) that are usually employed for specifying the required functionality of reconfigurable systems do not provide a direct support for recursion. In this paper a method allowing recursive algorithms to be easily described in Handel-C and implemented in an FPGA (field-programmable gate array) is proposed. The recursive search algorithm for the knapsack problem is considered as an exampleApplications in Artificial Intelligence - Knowledge EngineeringRed de Universidades con Carreras en Informática (RedUNCI

Hardware acceleration of the trace transform for vision applications

Author: Fahmy Suhaib A.
Publication venue: Department of Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2008
Field of study

Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

Spiral - Imperial College Digital Repository

High performance reconfigurable architectures for bioinformatics and computational biology applications

Author: Kasap Server
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Edinburgh Research Archive

Automated Generating of Processing Elements for FPGA

Author: Lengál Ondřej
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2008
Field of study

Některé aplikace zpracovávající informace, jako je například monitorování počítačových sítí, vyžadují nepřetržité zpracovávání dat přicházejících vysokou rychlostí. S tím, jak tato rychlost vývojem stále stoupá, je žádoucí, aby bylo zpracovávání dat prováděno pomocí hardwarové implementace. Tato práce navrhuje konfigurační systém transformující uživatelem poskytnutou definici procesních funkcí na VHDL definici hardwarové implementace těchto funkcí. Systém je zaměřen na monitorování síťového provozu ve vysokorychlostních sítích.Some information processing applications, such as computer networks monitoring, need to continuously perform processing of rapidly incoming data. As the speed of the incoming data increases, it is desirable to perform the processing in the hardware. This work proposes a configuration system that generates a VHDL specification of a hardware data processing circuit based on a user-provided definition of data and computation operations. The system focuses on network traffic monitoring in multi-gigabit computer networks.

Digital library of Brno University of Technology

National Repository of Grey Literature