Search CORE

1,504 research outputs found

Cycle-accurate multicore performance models on FPGAs

Author: Pellauer Michael (Michael Ignatius)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-165).The goal of this project is to improve computer architecture by accelerating cycle-accurate performance modeling of multicore processors using FPGAs. Contributions include a distributed technique controlling simulation on a highly-parallel substrate, hardware design techniques to reduce development effort, and a specific framework for modeling shared-memory multicore processors paired with realistic On-Chip Networks.by Michael Pellauer.Ph.D

DSpace@MIT

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

Author: Ergin Oguz
Kestelman Adrian Cristal
Koc Fahrettin
Mutlu Onur
Onural Erhan Baturay
Salami Behzad
Sarbazi-Azad Hamid
Unsal Osman S.
Yuksel Ismail Emir
Publication venue
Publication date: 01/01/2020
Field of study

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

TOBB ETÜ Institutional Repository

Seeing Shapes in Clouds: On the Performance-Cost trade-off for Heterogeneous Infrastructure-as-a-Service

Author: Constantinides George
Inggs Gordon
Luk Wayne
Thomas David B.
Publication venue
Publication date: 27/08/2015
Field of study

In the near future FPGAs will be available by the hour, however this new Infrastructure as a Service (IaaS) usage mode presents both an opportunity and a challenge: The opportunity is that programmers can potentially trade resources for performance on a much larger scale, for much shorter periods of time than before. The challenge is in finding and traversing the trade-off for heterogeneous IaaS that guarantees increased resources result in the greatest possible increased performance. Such a trade-off is Pareto optimal. The Pareto optimal trade-off for clusters of heterogeneous resources can be found by solving multiple, multi-objective optimisation problems, resulting in an optimal allocation of tasks to the available platforms. Solving these optimisation programs can be done using simple heuristic approaches or formal Mixed Integer Linear Programming (MILP) techniques. When pricing 128 financial options using a Monte Carlo algorithm upon a heterogeneous cluster of Multicore CPU, GPU and FPGA platforms, the MILP approach produces a trade-off that is up to 110% faster than a heuristic approach, and over 50% cheaper. These results suggest that high quality performance-resource trade-offs of heterogeneous IaaS are best realised through a formal optimisation approach.Comment: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

High-Level Debugging and Verification for FPGA-Based Multicore Architectures

Author: Arcas Oriol
Cristal Kestelman Adrián
Unsal Osman S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/05/2015
Field of study

Simulators are key tools for computer architecture research. However, multicore architectures represent a highly complex challenge for software simulators, which may suffer from fidelity loss and long execution times. FPGAs can simulate multicore architectures with scalable performance and high accuracy, but the difficulty of debugging could hinder their adoption. In this paper we propose several techniques for inspection, debugging and verification of multicore architectures, both for software-based and FPGA-based simulations. These debugging extensions are cycle-accurate and unobtrusive. As a proof of concept, we have developed a 24-core RISC multiprocessor that runs the Linux Kernel, for which we provide three simulation modes: a fast, functional simulation; a detailed, cycle-accurate simulation; and a FPGA-based simulation. Our platform can run up to 24 cores and perform full-system verification at 17 million instructions per second.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

A general technique for deterministic model-cycle-level debugging

Author: Khan Asif Imtiaz
Mithal Arvind
Vijayaraghavan Muralidaran
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2012
Field of study

Efficient use of FPGA resources requires FPGA-based performance models of complex hardware to implement one model cycle, i.e., one time-step of the original synchronous system, in several implementation cycles. Generally implementation cycles have no simple relationship with model cycles, and it is tricky to reconstruct the state of the synchronous system at the model-cycle boundaries if only implementation-cycle-level control and information is provided. A good debugging facility needs to provide: complete control over the functioning of the target design being simulated; fast and easy access to all the significant target design state for both monitoring and modification; and some means of accomplishing deterministic execution when the target design is a multicore processor running a parallel application. Moreover, these features need to be provided in a manner which does not incur substantial resource and performance penalties. In this paper, we present a debugging technique based on the LI-BDN theory. We show how the technique facilitates deterministic model-cycle-level debugging. We used it to build the debugging infrastructure for Arete, which is an FPGA-based cycle-accurate multicore simulator. The resource and performance penalties of our debugging technique are minimal; in Arete the debugging infrastructure has area and performance overheads of 5% and 6%, respectively.IBM Researc

DSpace@MIT

Performance Aspects of Synthesizable Computing Systems

Author: Schleuniger Pascal
Publication venue: Technical University of Denmark
Publication date: 01/01/2014
Field of study

Online Research Database In Technology