Search CORE

45 research outputs found

A Novel Set of Directives for Multi-device Programming with OpenMP

Author: Ferrer Roger
Teruel Xavier
Torres Raul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

This work was supported by MEEP project, which has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 946002. The JU receives support from theEuropean Union’s Horizon 2020 research and innovation programme and Spain, Croatia, Turkey.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views

Author: Dirks Hendrik
Gorlatch Sergei
Haidl Michael
Hummernbrum Tim
Steuwer Michel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/02/2017
Field of study

In this paper, we advocate a composable approach to programming systems with Graphics Processing Units (GPU): programs are developed as compositions of generic, reusable patterns. Current GPU programming approaches either rely on low-level, monolithic code without patterns (CUDA and OpenCL), which achieves high performance at the cost of cumbersome and error-prone programming, or they improve the programmability by using pattern-based abstractions (e.g., Thrust) but pay a performance penalty due to inefficient implementations of pattern composition. We develop an API for GPUs based programming on C++ with STL-style patterns and its compiler-based implementation. Our API gives the application developers the native C++ means (views and actions) to specify precisely which pattern compositions should be automatically fused during code generation into a single efficient GPU kernel, thereby ensuring a high target performance. We implement our approach by extending the range-v3 library which is currently being developed for the forthcoming C++ standards. The composable programming in our approach is done exclusively in the standard C++14, with STL algorithms used as patterns which we re-implemented in parallel for GPU. Our compiler implementation is based on the LLVM and Clang frameworks, and we use advanced multi-stage programming techniques for aggressive runtime optimizations. We experimentally evaluate our approach using a set of benchmark applications and a real-world case study from the area of image processing. Our codes achieve performance competitive with CUDA monolithic implementations, and we outperform pattern-based codes written using Nvidia’s Thrust

Crossref

Edinburgh Research Explorer

Enlighten

parMERASA Multi-Core Execution of Parallelised Hard Real-Time Applications Supporting Analysability

Author: Abella Jaume
Bonenfant Armelle
Bradatsch Christian
Broster Ian
Böddeker Bert
Cassé Hugues
Cazorla Francisco
Fernandes Joao
George David
Gerdes Mike
Hugl Andreas
Jahr Ralf
Kehr Sebastian
Kluge Florian
Lay Nick
Mische Jörg
Ozaktas Haluk
Panic Milos
Petrov Zlatko
Pyka Arthur
Quinones Eduardo
Regler Hans
Rochange Christine
Rohde Mathias
Sainrat Pascal
Uhrig Sascha
Ungerer Theo
Zaykov Pavel G.
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceEngineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores

OPUS Augsburg

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Advances in Engineering Software for Multicore Systems

Author: Jannesari Ali
Publication venue: 'IntechOpen'
Publication date: 07/02/2018
Field of study

The vast amounts of data to be processed by today’s applications demand higher computational power. To meet application requirements and achieve reasonable application performance, it becomes increasingly profitable, or even necessary, to exploit any available hardware parallelism. For both new and legacy applications, successful parallelization is often subject to high cost and price. This chapter proposes a set of methods that employ an optimistic semi-automatic approach, which enables programmers to exploit parallelism on modern hardware architectures. It provides a set of methods, including an LLVM-based tool, to help programmers identify the most promising parallelization targets and understand the key types of parallelism. The approach reduces the manual effort needed for parallelization. A contribution of this work is an efficient profiling method to determine the control and data dependences for performing parallelism discovery or other types of code analysis. Another contribution is a method for detecting code sections where parallel design patterns might be applicable and suggesting relevant code transformations. Our approach efficiently reports detailed runtime data dependences. It accurately identifies opportunities for parallelism and the appropriate type of parallelism to use as task-based or loop-based

IntechOpen

Crossref

Recommended from our members

Achieving Accurate Predictions of Future Events Under Hardware Heterogeneity

Author: Prodromou Andreas
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Heterogeneous hardware is becoming increasingly available in modern hardware, while research breakthroughs enforce the expectation that heterogeneity will keep increasing in the future. Significant gains can be achieved via appropriate utilization of heterogeneity, in terms of performance and power consumption, however, poor utilization can have a detrimental effect. Intelligent scheduling and resource management is a crucial challenge we need to overcome in order to harvest the full potential of heterogeneous hardware. As systems become larger and include greater levels of hardware diversity, the importance of intelligent scheduling and resource management is further accentuated.This dissertation presents techniques that aid the process of scheduling and resource management in the presence of heterogeneous hardware, via accurately predicting upcoming runtime events. With a proactive and accurate view of the near future, schedulers can utilize the underlying hardware more efficiently, and fully take advantage of the available benefits.By adapting a majority element heuristic, this dissertation significantly improves the accuracy of predicting memory addresses about to be accessed, while reducing prediction-related costs by a factor of ten thousand compared to previously proposed predictive approaches. Coupled with novel microarchitectural modifications, accurate address predictions are shown to improve the performance of heterogeneous memory architectures.Machine learning-based performance predictors are further presented, capable of predicting a program's performance when executed on a given general-purpose core. Trained to model the subtleties of the interaction between hardware and software, these predictors are capable of generating highly accurate predictions even for cores with varied Instruction Set Architectures. Utilizing these performance predictions for job scheduling, is shown to improve overall system performance. The trained predictors are further examined and interpreted in order to visualize the correlations between features picked up and amplified during training.Finally, this dissertation demonstrates that scheduling algorithms cannot guarantee deriving an optimal schedule during realistic execution scenarios due to the underlying hardware heterogeneity, the wide range of runtime requirements of software, as well as prediction error from performance predictors. In response, deep neural networks are trained to select one scheduling approach from a list of options with varied overheads and correctness guarantees. The scheduling approach chosen, is the one which will most likely return the highest-performance schedule with the lowest overhead, given a particular instance of the job-to-core assignment problem

eScholarship - University of California

parMERASA – multicore execution of parallelised hard real-time applications supporting analysability

Author: A Bonenfant
A Hugl
A Pyka
B Böddeker
C Bradatsch
C Rochange
D George
E Quiñones
F Cazorla
F Kluge
H Cassé
H Ozaktas
H Regler
I Broster
J Abella
J Fernandes
J Mische
M Gerdes
M Panic
M Rohde
N Lay
P G Zaykov
P Sainrat
R Jahr
S Kehr
S Uhrig
T Ungerer
Z Petrov
Publication venue
Publication date: 11/04/2020
Field of study

Abstract-Engineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores

CiteSeerX

Thermal-aware resource allocation in earliest deadline first using fluid scheduling

Author: Ahmad Ghufran
Anjum Adeel
Awais Muhammad Naeem
Balubaid Muhammad A.
Bashir Qaisar
Manzoor Umar
Saba Tanzila
Shaikh Zeeshan Azmat
Shehzad Muhammad Naeem
Publication venue: 'SAGE Publications'
Publication date: 01/03/2019
Field of study

Thermal issues in microprocessors have become a major design constraint because of their adverse effects on the reliability, performance and cost of the system. This article proposes an improvement in earliest deadline first, a uni-processor scheduling algorithm, without compromising its optimality in order to reduce the thermal peaks and variations. This is done by introducing a factor of fairness to earliest deadline first algorithm, which introduces idle intervals during execution and allows uniform distribution of workload over the time. The technique notably lowers the number of context switches when compare with the previous thermal-aware scheduling algorithm based on the same amount of fairness. Although, the algorithm is proposed for uni-processor environment, it is also applicable to partitioned scheduling in multi-processor environment, which primarily converts the multi-processor scheduling problem to a set of uni-processor scheduling problem and thereafter uses a uni-processor scheduling technique for scheduling. The simulation results show that the proposed approach reduces up to 5% of the temperature peaks and variations in a uni-processor environment while reduces up to 7% and 6% of the temperature spatial gradient and the average temperature in multi-processor environment, respectively

Aston Publications Explorer

Machine learning-aided numerical linear Algebra: Convolutional neural networks for the efficient preconditioner generation

Author: Anzt Hartwig
Götz Markus
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2019
Field of study

KITopen

Paving the way towards high-level parallel pattern interfaces for data stream processing

Author: Dolz Zaragoza Manuel Francisco
Fernández Muñoz Javier
García Sánchez José Daniel
Río Astorga David del
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

The emergence of the Internet of Things (IoT) data stream applications has posed a number of new challenges to existing infrastructures, processing engines, and programming models. In this sense, high-level interfaces, encapsulating algorithmic aspects in pattern-based constructions, have considerably reduced the development and parallelization efforts of this type of applications. An example of parallel pattern interface is GrPPI, a C++ generic high-level library that acts as a layer between developers and existing parallel programming frameworks, such as C++ threads, OpenMP and Intel TBB. In this paper, we complement the basic patterns supported by GrPPI with the new stream operators Split-Join and Window, and the advanced parallel patterns Stream-Pool, Windowed-Farm and Stream-Iterator for the aforementioned back ends. Thanks to these new stream operators, complex compositions among streaming patterns can be expressed. On the other hand, the collection of advanced patterns allows users to tackle some domain-specific applications, ranging from the evolutionary to the real-time computing areas, where compositions of basic patterns are not capable of fully mimicking the algorithmic behavior of their original sequential codes. The experimental evaluation of the new advanced patterns and the stream operators on a set of domain-specific use-cases, using different back ends and pattern-specific parameters, reports considerable performance gains with respect to the sequential versions. Additionally, we demonstrate the benefits of the GrPPI pattern interface from the usability, flexibility and readability points of view.This work was partially supported by the EU project ICT 644235 “RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications” and the project TIN2013-41350-P “Scalable Data Management Techniques for High-End Computing Systems” from the Ministerio de Economía y Competitividad, Spai

Universidad Carlos III de Madrid e-Archivo