Search CORE

13 research outputs found

Analytical Bounds for Optimal Tile Size Selection

Author: C. Hsu
J. Ferrante
J. Ramanujam
J. Xue
J.A. Nelder
M. Luersen
P. Boulet
P.M.W. Knijnenburg
R.C. Whaley
S. Ghosh
T.W. Barr
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Inter-Tile Reuse Optimization Applied to Bandwidth Constrained Embedded Accelerators

Author: Corporaal H.
Mesman B.
Peemen M.C.J.
Publication venue: 'EDAA'
Publication date: 01/01/2015
Field of study

The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. A complex scaling problem that remains is the data transfer bottleneck. To scale-up performance accelerators require huge amounts of data, and are often limited by interconnect resources. In addition, the energy spent by the accelerator is often dominated by the transfer of data, either in the form of memory references or data movement on interconnect. In this paper we drastically reduce accelerator communication by exploration of computation reordering and local buffer usage. Consequently, we present a new analytical methodology to optimize nested loops for inter-tile data reuse with loop transformations like interchange and tiling. We focus on embedded accelerators that can be used in a multi-accelerator System on Chip (SoC), so performance, area, and energy are key in this exploration. 1) On three common embedded applications in the image/video processing domain (demosaicing, block matching, object detection), we show that our methodology reduces data movement up to 2.1x compared to the best case of intra-tile optimization. 2) We demonstrate that our small accelerators (1-3% FPGA resources) can boost a simple MicroBlaze soft-core to the performance level of a high-end Intel-i7 processor

Crossref

Hardware aware tiling optimization for multi-core systems

Author: Adamski Dominik
Jabłoński Grzegorz
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2017
Field of study

This paper presents a proposition of the new tool which improves tiling efficiencyfor given hardware architecture. This article also describes the correlationbetween changing hardware architecture and methods of software optimization.First chapter includes short description of the change in hardware architecturewhich has occurred in recent 10 years. The second chapter provides an overviewof tools which will be used in further research. The consecutive sections containdescription of proposed hardware-aware tool for optimal tiling

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Crossref

Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures

Author: Emre Süreyya
Rastello Fabrice
Sadayyapan Ponnuswamy
Sukumaran-Rajam Aravind
Publication venue: HAL CCSD
Publication date: 09/11/2020
Field of study

International audienceTiling is a key technique to reduce data movement in matrix computations. While tiling is well understood and widely used for dense matrix/tensor computations, effective tiling of sparse matrix computations remains a challenging problem. This paper proposes a novel method to efficiently summarize the impact of the sparsity structure of a matrix on achievable data reuse as a one-dimensional signature, which is then used to build an analytical cost model for tile size optimization for sparse matrix computations. The proposed model-driven approach to sparse tiling is evaluated on two key sparse matrix kernels: Sparse Matrix-Dense Matrix Multiplication (SpMM) and Sampled Dense-Dense Matrix Multiplication (SDDMM). Experimental results demonstrate that model-based tiled SpMM and SDDMM achieve high performance relative to the current state-of-the-art

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Програмна система автоматизації створення оптимізованих додатків користувача для паралельних вбудованих систем

Author: Душабаєв Рустам Толкинбайович
Publication venue: Київ
Publication date: 01/06/2022
Field of study

Магістерська дисертація містить 102 сторінки, 75 рисунків, 14 таблиць, 1 додаток, 30 джерел. Об`єкт дослідження: паралельні вбудовані системи. Мета магістерської дисертації: підвищення ефективності оптимізації додатків методом тайліенгу Предмет дослідження: автоматизована система створення оптимізованих додатків користувача для паралельних вбудованих систем. Наукова новизна одержаних у магістерській дисертації результатів полягає у вдосконаленні ефективності оптимізації додатків методом тайліенгу, а саме – у реалізації пошуку оптимальних розмірів тайлів методом генетичного алгоритму.The master's dissertation contains 102 pages, 75 figures, 14 tables, 1 appendix, 30 sources. Object of research: parallel embedded systems. The purpose of the master's dissertation: to increase the efficiency of optimization of applications by tailing Subject of research: automated system for creating optimized user applications for parallel embedded systems. The scientific novelty of the results obtained in the master's dissertation is to improve the efficiency of optimization of applications by tailing, namely - in the implementation of the search for optimal tile sizes by genetic algorithm.

Electronic Archive of Kyiv Polytechnic Institute

HETEROGENEOUS SYSTEM DESIGN AND OPTIMISATION FOR EMBEDDED VISION SYSTEMS

Author: Jiang Chao
Publication venue
Publication date: 01/08/2020
Field of study

The University of Manchester - Institutional Repository

Analytical cost metrics: days of future past

Author: Prajapati Nirmal
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2019
Field of study

2019 Summer.Includes bibliographical references.Future exascale high-performance computing (HPC) systems are expected to be increasingly heterogeneous, consisting of several multi-core CPUs and a large number of accelerators, special-purpose hardware that will increase the computing power of the system in a very energy-efficient way. Specialized, energy-efficient accelerators are also an important component in many diverse systems beyond HPC: gaming machines, general purpose workstations, tablets, phones and other media devices. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. This work builds analytical cost models for cost metrics such as time, energy, memory access, and silicon area. These models are used to predict the performance of applications, for performance tuning, and chip design. The idea is to work with domain specific accelerators where analytical cost models can be accurately used for performance optimization. The performance optimization problems are formulated as mathematical optimization problems. This work explores the analytical cost modeling and mathematical optimization approach in a few ways. For stencil applications and GPU architectures, the analytical cost models are developed for execution time as well as energy. The models are used for performance tuning over existing architectures, and are coupled with silicon area models of GPU architectures to generate highly efficient architecture configurations. For matrix chain products, analytical closed form solutions for off-chip data movement are built and used to minimize the total data movement cost of a minimum op count tree

Mountain Scholar (Digital Collections of Colorado and Wyoming)