Search CORE

4 research outputs found

Multiple target task sharing support for the OpenMP accelerator model

Author: Ayguadé Parra Eduard
Beyer James B.
Labarta Mancho Jesús José
Mateo Sergi
Ozen Guray
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The use of GPU accelerators is becoming common in HPC platforms due to the their effective performance and energy efficiency. In addition, new generations of multicore processors are being designed with wider vector units and/or larger hardware thread counts, also contributing to the peak performance of the whole system. Although current directive–based paradigms, such as OpenMP or OpenACC, support both accelerators and multicore-based hosts, they do not provide an effective and efficient way to concurrently use them, usually resulting in accelerated programs in which the potential computational performance of the host is not exploited. In this paper we propose an extension to the OpenMP 4.5 directive-based programming model to support the specification and execution of multiple instances of task regions on different devices (i.e. accelerators in conjunction with the vector and heavily multithreaded capabilities in multicore processors). The compiler is responsible for the generation of device-specific code for each device kind, delegating to the runtime system the dynamic schedule of the tasks to the available devices. The new proposed clause conveys useful insight to guide the scheduler while keeping a clean, abstract and machine independent programmer interface. The potential of the proposal is analyzed in a prototype implementation in the OmpSs compiler and runtime infrastructure. Performance evaluation is done using three kernels (N-Body, tiled matrix multiply and Stream) on different GPU-capable systems based on ARM, Intel x86 and IBM Power8. From the evaluation we observe speed–ups in the 8–20% range compared to versions in which only the GPU is used, reaching 96 % of the additional peak performance thanks to the reduction of data transfers and the benefits introduced by the OmpSs NUMA-aware scheduler.This work is partially supported by the IBM/BSC Deep Learning Center Initiative, by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contract 2014-SGR-1051).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

Author: He Xin
Li Yiran
Sun Ruiqi
Zhao Jie
Zou An
Publication venue
Publication date: 06/10/2023
Field of study

The inherent diversity of computation types within individual deep neural network (DNN) models necessitates a corresponding variety of computation units within hardware processors, leading to a significant constraint on computation efficiency during neural network execution. In this study, we introduce NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations, effectively enabling their execution with one general-purpose matrix multiplication (GEMM) accelerator. By surmounting the constraints posed by the diverse computation types required by individual network models, this approach provides both generality, allowing a wide range of DNN models to be executed using a single GEMM accelerator and application-specific acceleration levels without extra special function units, which are validated through main stream DNNs and their variant models.Comment: 12 pages, 4figures, Submitted to 11th International Conference on Learning Representation

arXiv.org e-Print Archive

Using and improving OpenMP for devices, tasks, and more : 10th International Workshop on OpenMP, IWOMP 2014, Salvador, Brazil, September 28 - 30, 2014 ; proceedings

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Publikationsserver der RWTH Aachen University

Using and improving OpenMP for devices, tasks, and more : 10th International Workshop on OpenMP, IWOMP 2014, Salvador, Brazil, September 28 - 30, 2014 ; proceedings

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study