Search CORE

339 research outputs found

A convolve-and-MErge approach for exact computations on high-performance reconfigurable computers

Author: El-Araby Esam
El-Ghazawi Tarek
González Iván
López-Buedo Sergio
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

This work presents an approach for accelerating arbitrary-precision arithmetic on high-performance reconfigurable computers (HPRCs). Although faster and smaller, fixed-precision arithmetic has inherent rounding and overflow problems that can cause errors in scientific or engineering applications. This recurring phenomenon is usually referred to as numerical nonrobustness. Therefore, there is an increasing interest in the paradigmof exact computation, based on arbitrary-precision arithmetic. There are a number of libraries and/or languages supporting this paradigm, for example, the GNUmultiprecision (GMP) library. However, the performance of computations is significantly reduced in comparison to that of fixed-precision arithmetic. In order to reduce this performance gap, this paper investigates the acceleration of arbitrary-precision arithmetic on HPRCs. A Convolve-And-MErge approach is proposed, that implements virtual convolution schedules derived from the formal representation of the arbitraryprecision multiplication problem. Additionally, dynamic (nonlinear) pipeline techniques are also exploited in order to achieve speedups ranging from 5x (addition) to 9x (multiplication), while keeping resource usage of the reconfigurable device low, ranging from 11% to 19%

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

KU ScholarWorks

Biblos-e Archivo

A Convolve-And-MErge Approach for Exact Computations on High-Performance Reconfigurable Computers

Author: El-Araby Esam
El-Ghazawi Tarek
Gonzalez Ivan
Lopez-Buedo Sergio
Publication venue: 'Hindawi Limited'
Publication date: 28/07/2016
Field of study

This work presents an approach for accelerating arbitrary-precision arithmetic on high-performance reconfigurable computers (HPRCs). Although faster and smaller, fixed-precision arithmetic has inherent rounding and overflow problems that can cause errors in scientific or engineering applications. This recurring phenomenon is usually referred to as numerical nonrobustness. Therefore, there is an increasing interest in the paradigm of exact computation, based on arbitrary-precision arithmetic. There are a number of libraries and/or languages supporting this paradigm, for example, the GNU multiprecision (GMP) library. However, the performance of computations is significantly reduced in comparison to that of fixed-precision arithmetic. In order to reduce this performance gap, this paper investigates the acceleration of arbitrary-precision arithmetic on HPRCs. A Convolve-And-MErge approach is proposed, that implements virtual convolution schedules derived from the formal representation of the arbitrary-precision multiplication problem. Additionally, dynamic (nonlinear) pipeline techniques are also exploited in order to achieve speedups ranging from 5x (addition) to 9x (multiplication), while keeping resource usage of the reconfigurable device low, ranging from 11% to 19%

KU ScholarWorks

An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

Author: Arcas Abella Oriol
Armejach Adrià
Cristal Kestelman Adrián
Ghasempour Mohsen
Lujan Mikel
Mawer John
Navaridas Javier
Ndu Geoffrey
Song Wei
Sönmez Nehir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

The ReNoC Reconfigurable Network-on-Chip:Architecture, Configuration Algorithms, and Evaluation

Author: Al Faruque M. A.
Chan J.
Dall’Osso M.
Duato J.
Hansson A.
Jens Sparsø
Matthias Bo Stuart
Mikkel Bystrup Stensgaard
Millberg M.
Rijpkema E.
Stensgaard M. B.
Stuart M. B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Crossref

Online Research Database In Technology

Reconfigurable Processing for Satellite On-Board Automatic Cloud Cover Assessment (ACCA)

Author: El-Araby Esam
Irish Richard
LeMoigne Jacqueline
Publication venue
Publication date
Field of study

Clouds have a critical role in many studies such as weather- and climate-related investigations. However, they represent a source of errors in many applications, and the presence of cloud contamination can hinder the use of satellite data. In addition, sending cloudy data to ground stations can result in an inefficient utilization of the communication bandwidth. This requires satellite on-board cloud detection capability to mask out cloudy pixels from further processing. Remote sensing satellite missions have always required smaller size, lower cost, more flexibility, and higher computational power. Reconfigurable Computers (RCs) combine the flexibility of traditional microprocessors with the power of Field Programmable Gate Arrays (FPGAs). Therefore, RCs are a promising candidate for on-board preprocessing. This paper presents the design and implementation of an RC-based real-time cloud detection system. We investigate the potential of using RCs for on-board preprocessing by prototyping the Landsat 7 ETM+ ACCA algorithm on one of the state-of-the-art reconfigurable platforms, SRC-6. It will be shown that our work provides higher detection accuracy and over one order of magnitude improvement in performance when compared to previously reported investigations

NASA Technical Reports Server

ShenZhen transportation system (SZTS): a novel big data benchmark suite

Author: Bei Zhengdong
Eeckhout Lieven
Xiong Wen
Xu Chengzhong
Yu Zhibin
Zhang Fan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Data analytics is at the core of the supply chain for both products and services in modern economies and societies. Big data workloads, however, are placing unprecedented demands on computing technologies, calling for a deep understanding and characterization of these emerging workloads. In this paper, we propose ShenZhen Transportation System (SZTS), a novel big data Hadoop benchmark suite comprised of real-life transportation analysis applications with real-life input data sets from Shenzhen in China. SZTS uniquely focuses on a specific and real-life application domain whereas other existing Hadoop benchmark suites, such as HiBench and CloudRank-D, consist of generic algorithms with synthetic inputs. We perform a cross-layer workload characterization at the microarchitecture level, the operating system (OS) level, and the job level, revealing unique characteristics of SZTS compared to existing Hadoop benchmarks as well as general-purpose multi-core PARSEC benchmarks. We also study the sensitivity of workload behavior with respect to input data size, and we propose a methodology for identifying representative input data sets

Ghent University Academic Bibliography

Throughput-optimal systolic arrays from recurrence equations

Author: Buhler Jeremy D.
Chamberlain Roger D.
Jacob Arpith C.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2009
Field of study

Many compute-bound software kernels have seen order-of-magnitude speedups on special-purpose accelerators built on specialized architectures such as field-programmable gate arrays (FPGAs). These architectures are particularly good at implementing dynamic programming algorithms that can be expressed as systems of recurrence equations, which in turn can be realized as systolic array designs. To efficiently find good realizations of an algorithm for a given hardware platform, we pursue software tools that can search the space of possible parallel array designs to optimize various design criteria. Most existing design tools in this area produce a design that is latency-space optimal. However, we instead wish to target applications that operate on a large collection of small inputs, e.g. a database of biological sequences. For such applications, overall throughput rather than latency per input is the most important measure of performance. In this work, we introduce a new procedure to optimize throughput of a systolic array subject to resource constraints, in this case the area and bandwidth constraints of an FPGA device. We show that the throughput of an array is dependent on the maximum number of lattice points executed by any processor in the array, which to a close approximation is determined solely by the array’s projection vector. We describe a bounded search process to find throughput-optimal projection vectors and a tool to perform automated design space exploration, discovering a range of array designs that are optimal for inputs of different sizes. We apply our techniques to the Nussinov RNA folding algorithm to generate multiple mappings of this algorithm into systolic arrays. By combining our library of designs with run-time reconfiguration of an FPGA device to dynamically switch among them, we predict significant speedup over a single, latency-space optimal array

Washington University St. Louis: Open Scholarship

Just-in-time Hardware generation for abstracted reconfigurable computing

Author: Grocutt Thomas Christopher
Publication venue
Publication date: 01/01/2005
Field of study

This thesis addresses the use of reconfigurable hardware in computing platforms, in order to harness the performance benefits of dedicated hardware whilst maintaining the flexibility associated with software. Although the reconfigurable computing concept is not new, the low level nature of the supporting tools normally used, together with the consequent limited level of abstraction and resultant lack of backwards compatibility, has prevented the widespread adoption of this technology. In addition, bandwidth and architectural limitations, have seriously constrained the potential improvements in performance. A review of existing approaches and tools flows is conducted to highlight the current problems being faced in this field. The objective of the work presented in this thesis is to introduce a radically new approach to reconfigurable computing tool flows. The runtime based tool flow introduces complete abstraction between the application developer and the underlying hardware. This new technique eliminates the ease of use and backwards compatibility issues that have plagued the reconfigurable computing concept, and could pave the way for viable mainstream reconfigurable computing platforms. An easy to use, cycle accurate behavioural modelling system is also presented, which was used extensively during the early exploration of new concepts and architectures. Some performance improvements produced by the new reconfigurable computing tool flow, when applied to both a MIPS based embedded platform, and the Cray XDl, are also presented. These results are then analyzed and the hardware and software factors affecting the performance increases that were obtained are discussed, together with potential techniques that could be used to further increase the performance of the system. Lastly a heterogenous computing concept is proposed, in which, a computer system, containing multiple types of computational resource is envisaged, each having their own strengths and weaknesses (e.g. DSPs, CPUs, FPGAs). A revolutionary new method of fully exploiting the potential of such a system, whilst maintaining scalability, backwards compatibility, and ease of use is also presented

Durham e-Theses

OpenGrey Repository

Modelling, Synthesis, and Configuration of Networks-on-Chips

Author: Stuart Matthias Bo
Publication venue: Technical University of Denmark
Publication date: 01/06/2010
Field of study

Online Research Database In Technology

Accelerating Exact Stochastic Simulation of Biochemical Systems

Author: McCollum James Michael
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2006
Field of study

The ability to accurately and efficiently simulate computer models of biochemical systems is of growing importance to the molecular biology and pharmaceutical research communities. Exact stochastic simulation is a popular approach for simulating such systems because it properly represents genetic noise and it accurately represents systems with small populations of chemical species. Unfortunately, the computational demands of exact stochastic simulation often limit its applicability. To enable next-generation whole-cell and multi-cell stochastic modeling, advanced tools and techniques must be developed to increase simulation efficiency. This work assesses the applicability of a variety of hardware and software acceleration approaches for exact stochastic simulation including serial algorithm improvements, parallel computing, reconfigurable computing, and cluster computing. Through this analysis, improved simulation techniques for biological systems are explored and evaluated

University of Tennessee, Knoxville: Trace