Search CORE

152 research outputs found

Area-energy aware dataflow optimisation of visual tracking systems

Author: A Rheinländer
D Comaniciu
E Schulte
EA Lee
J Sérot
J Teifel
P Turcza
R Stewart
Robert Stewart
Publication venue
Publication date: 01/01/2018
Field of study

This paper presents an orderly dataflow-optimisation approach suitable for area-energy aware computer vision applications on FPGAs. Vision systems are increasingly being deployed in power constrained scenarios, where the dataflow model of computation has become popular for describing complex algorithms. Dataflow model allows processing datapaths comprised of several independent and well defined computations. However, compilers are often unsuccessful in identifying domain-specific optimisation opportunities resulting in wasted resources and power consumption. We present a methodology for the optimisation of dataflow networks, according to patterns often found in computer vision systems, focusing on identifying optimisations which are not discovered automatically by an optimising compiler. Code transformation using profiling and refactoring provides opportunities to optimise the design, targeting FPGA implementations and focusing on area and power abatement. Our refactoring methodology, applying transformations to a complex algorithm for visual tracking resulted in significant reduction in power consumption and resource usage

A RISC-V-based FPGA Overlay to Simplify Embedded Accelerator Deployment

Author: Bellocchi Gianluca
Capotondi Alessandro
Conti Francesco
Marongiu Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Modern cyber-physical systems (CPS) are increasingly adopting heterogeneous systems-on-chip (HeSoCs) as a computing platform to satisfy the demands of their sophisticated workloads. FPGA-based HeSoCs can reach high performance and energy efficiency at the cost of increased design complexity. High-Level Synthesis (HLS) can ease IP design, but automated tools still lack the maturity to efficiently and easily tackle system-level integration of the many hardware and software blocks included in a modern CPS. We present an innovative hardware overlay offering plug-and-play integration of HLS-compiled or handcrafted acceleration IPs thanks to a customizable wrapper attached to the overlay interconnect and providing shared-memory communication to the overlay cores. The latter are based on the open RISC-V ISA and offer simplified software management of the acceleration IP. Deploying the proposed overlay on a Xilinx ZU9EG shows ≈ 20% LUT usage and ≈ 4× speedup compared to program execution on the ARM host core

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Simulated annealing based datapath synthesis

Author: Neil John Paul
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point Analysis

Author: Sanyal Sourav
Publication venue: DigitalCommons@USU
Publication date: 01/08/2019
Field of study

General purpose graphics processing units (GP-GPU), owing to their enormous thread-level parallelism, can significantly improve the power consumption at the near-threshold (NTC) operating region, while offering close to a super-threshold performance. However, process variation (PV) can drastically reduce the GPU performance at NTC. In this work, choke points—a unique device-level characteristic of PV at NTC—that can exacerbate the warp criticality problem in GPUs have been explored. It is shown that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. Additionally, Choke Point Aware Warp Speculator, a circuit-architectural solution is proposed to dynamically predict the critical warps in GPUs, and accelerate them in their respective execution units. The best scheme achieves an average improvement of ∼39% in performance, and ∼31% in energy-efficiency, over one state-of-the-art warp scheduler, across 15 GPGPU applications, while incurring marginal hardware overheads