Search CORE

1,044 research outputs found

Seeing Shapes in Clouds: On the Performance-Cost trade-off for Heterogeneous Infrastructure-as-a-Service

Author: Constantinides George
Inggs Gordon
Luk Wayne
Thomas David B.
Publication venue
Publication date: 27/08/2015
Field of study

In the near future FPGAs will be available by the hour, however this new Infrastructure as a Service (IaaS) usage mode presents both an opportunity and a challenge: The opportunity is that programmers can potentially trade resources for performance on a much larger scale, for much shorter periods of time than before. The challenge is in finding and traversing the trade-off for heterogeneous IaaS that guarantees increased resources result in the greatest possible increased performance. Such a trade-off is Pareto optimal. The Pareto optimal trade-off for clusters of heterogeneous resources can be found by solving multiple, multi-objective optimisation problems, resulting in an optimal allocation of tasks to the available platforms. Solving these optimisation programs can be done using simple heuristic approaches or formal Mixed Integer Linear Programming (MILP) techniques. When pricing 128 financial options using a Monte Carlo algorithm upon a heterogeneous cluster of Multicore CPU, GPU and FPGA platforms, the MILP approach produces a trade-off that is up to 110% faster than a heuristic approach, and over 50% cheaper. These results suggest that high quality performance-resource trade-offs of heterogeneous IaaS are best realised through a formal optimisation approach.Comment: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Firmware and gateway for the ACE1 reconfigurable accelerator card

Author: Thorne Nicholas James
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2011
Field of study

This thesis describes the continued work on the in-house designed FPGA based co-processor daughtercard referred to as ACE1. The aim: to create an ecosystem incorporating firmware, bootstrapping code, drivers and a development environment to create a seamless environment. Challenges in setting up and debugging the interface that connects the coprocessor daughtercard to the host server include: problems with the power network, the edge connectors and timing problems with the primary protocol which prevented host-based communications. The options include allowing the daughtercard to function in a stand-alone fashion and we present a gateware solution that allows users to select from a number of alternatives for each of the layers in the Open Systems Interconnect networking model

Cape Town University OpenUCT

Generation of logic designs for efficiently solving ordinary differential equations on field programmable gate arrays

Author: Bartel Silas
Korch Matthias
Publication venue: 'Wiley'
Publication date: 19/10/2021
Field of study

EPub Bayreuth

The future of computing beyond Moore's Law.

Author: Shalf John
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

Ezid

eScholarship - University of California

White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing

Author: Boku Taisuke
Domke Jens
Fujita Norihisa
Fukaya Takeshi
Hoshi Takeo
Huthmann Jens
Iakymchuk Roman
Imamura Toshiyuki
Jézéquel Fabienne
Kudo Shuhei
Mukunoki Daichi
Murakami Yuki
Nakata Maho
Ogita Takeshi
Ohlhus Kai Torben
Podobas Artur
Sano Kentaro
Tan Yiyu
Publication venue
Publication date: 07/04/2020
Field of study

In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which adjusts the optimal precision for each operation and data. Several studies have been already conducted for it so far (e.g. Precimoniuos and Verrou), but the scope of those studies is limited to the precision-tuning alone. Hence, we aim to propose a broader concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. In 2019, we have started the Minimal-Precision Computing project to propose a more broad concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. Specifically, our system combines (1) a precision-tuning method based on Discrete Stochastic Arithmetic (DSA), (2) arbitrary-precision arithmetic libraries, (3) fast and accurate numerical libraries, and (4) Field-Programmable Gate Array (FPGA) with High-Level Synthesis (HLS). In this white paper, we aim to provide an overview of various technologies related to minimal- and mixed-precision, to outline the future direction of the project, as well as to discuss current challenges together with our project members and guest speakers at the LSPANC 2020 workshop; https://www.r-ccs.riken.jp/labs/lpnctrt/lspanc2020jan/

arXiv.org e-Print Archive

HAL Descartes

Applications for Ultrascale Computing

Author: Bongo Lars Ailo
Ciegis Raimondas
Frasheri Neki
Gong Jing
Kimovski Dragi
Kropf Peter
Margenov Svetozar
Mihajlovic Milan
Neytcheva Maya
Rauber Thomas
Runger Gudula
Trobec Roman
Wuyts Roel
Wyrzykowski Roman
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2015
Field of study

The University of Manchester - Institutional Repository

FPGA-Based Acceleration of the Self-Organizing Map (SOM) Algorithm using High-Level Synthesis

Author: Oninda Mohammad Abdul Moin
Publication venue: 'University of Windsor Leddy Library'
Publication date: 17/11/2019
Field of study

One of the fastest growing and the most demanding areas of computer science is Machine Learning (ML). Self-Organizing Map (SOM), categorized as unsupervised ML, is a popular data-mining algorithm widely used in Artificial Neural Network (ANN) for mapping high dimensional data into low dimensional feature maps. SOM, being computationally intensive, requires high computational time and power when dealing with large datasets. Acceleration of many computationally intensive algorithms can be achieved using Field-Programmable Gate Arrays (FPGAs) but it requires extensive hardware knowledge and longer development time when employing traditional Hardware Description Language (HDL) based design methodology. Open Computing Language (OpenCL) is a standard framework for writing parallel computing programs that execute on heterogeneous computing systems. Intel FPGA Software Development Kit for OpenCL (IFSO) is a High-Level Synthesis (HLS) tool that provides a more efficient alternative to HDL-based design. This research presents an optimized OpenCL implementation of SOM algorithm on Stratix V and Arria 10 FPGAs using IFSO. Compared to recent SOM implementations on Central Processing Unit (CPU) and Graphics Processing Unit (GPU), our OpenCL implementation on FPGAs provides superior speed performance and power consumption results. Stratix V achieves speedup of 1.41x - 16.55x compared to AMD and Intel CPU and 2.18x compared to Nvidia GPU whereas Arria 10 achieves speedup of 1.63x - 19.15x compared to AMD and Intel CPU and 2.52x compared to Nvidia GPU. In terms of power consumption, Stratix V is 35.53x and 42.53x whereas Arria 10 is 15.82x and 15.93x more power efficient compared to CPU and GPU respectively

Scholarship at UWindsor