Search CORE

4,482 research outputs found

A Modeling Approach based on UML/MARTE for GPU Architecture

Author: Dekeyser Jean-Luc
Guyomarc'H Frédéric
Rodrigues Antonio Wendell De Oliveira
Publication venue
Publication date: 10/05/2011
Field of study

Nowadays, the High Performance Computing is part of the context of embedded systems. Graphics Processing Units (GPUs) are more and more used in acceleration of the most part of algorithms and applications. Over the past years, not many efforts have been done to describe abstractions of applications in relation to their target architectures. Thus, when developers need to associate applications and GPUs, for example, they find difficulty and prefer using API for these architectures. This paper presents a metamodel extension for MARTE profile and a model for GPU architectures. The main goal is to specify the task and data allocation in the memory hierarchy of these architectures. The results show that this approach will help to generate code for GPUs based on model transformations using Model Driven Engineering (MDE).Comment: Symposium en Architectures nouvelles de machines (SympA'14) (2011

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

A C-DAG task model for scheduling complex real-time tasks on heterogeneous platforms: preemption matters

Author: Bertogna Marko
Capodieci Nicola
Cavicchioli Roberto
Lipari Giuseppe
Zahaf Houssam-Eddine
Publication venue
Publication date: 07/01/2019
Field of study

Recent commercial hardware platforms for embedded real-time systems feature heterogeneous processing units and computing accelerators on the same System-on-Chip. When designing complex real-time application for such architectures, the designer needs to make a number of difficult choices: on which processor should a certain task be implemented? Should a component be implemented in parallel or sequentially? These choices may have a great impact on feasibility, as the difference in the processor internal architectures impact on the tasks' execution time and preemption cost. To help the designer explore the wide space of design choices and tune the scheduling parameters, in this paper we propose a novel real-time application model, called C-DAG, specifically conceived for heterogeneous platforms. A C-DAG allows to specify alternative implementations of the same component of an application for different processing engines to be selected off-line, as well as conditional branches to model if-then-else statements to be selected at run-time. We also propose a schedulability analysis for the C-DAG model and a heuristic allocation algorithm so that all deadlines are respected. Our analysis takes into account the cost of preempting a task, which can be non-negligible on certain processors. We demonstrate the effectiveness of our approach on a large set of synthetic experiments by comparing with state of the art algorithms in the literature

arXiv.org e-Print Archive

HAL Descartes

XinuPi3: Teaching Multicore Concepts Using Embedded Xinu

Author: Bansal Priva
Brylow Dennis
Latinovich Rade
Lazar Tom
McGee Patrick J.
Publication venue: e-Publications@Marquette
Publication date: 01/01/2017
Field of study

As computer platforms become more advanced, the need to teach advanced computing concepts grows accordingly. This paper addresses one such need by presenting XinuPi3, a port of the lightweight instructional operating system Embedded Xinu to the Raspberry Pi 3. The Raspberry Pi 3 improves upon previous generations of inexpensive, credit card-sized computers by including a quad-core, ARM-based processor, opening the door for educators to demonstrate essential aspects of modern computing like inter-core communication and genuine concurrency. Embedded Xinu has proven to be an effective teaching tool for demonstrating low-level concepts on single-core platforms, and it is currently used to teach a range of systems courses at multiple universities. As of this writing, no other bare metal educational operating system supports multicore computing. XinuPi3 provides a suitable learning environment for beginners on genuinely concurrent hardware. This paper provides an overview of the key features of the XinuPi3 system, as well as the novel embedded system education experiences it makes possible

epublications@Marquette

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

Safety-related challenges and opportunities for GPUs in the automotive domain

Author: Abella Ferrer Jaume
Alcaide Portet Sergi
Cazorla Almeida Francisco Javier
Hernández Luz Carles
Kosmidis Leonidas
Tabani Hamid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

GPUs have been shown to cover the computing performance needs of autonomous driving (AD) systems. However, since the GPUs used for AD build on designs for the mainstream market, they may lack fundamental properties for correct operation under automotive's safety regulations. In this paper, we analyze some of the main challenges in hardware and software design to embrace GPUs as the reference computing solution for AD, with the emphasis in ISO 26262 functional safety requirements.Authors would like to thank Guillem Bernat from Rapita Systems for his technical feedback on this work. The research leading to this work has received funding from the European Re-search Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 772773). This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Carles Hernández is jointly funded by the Spanish Ministry of Economy and Competitiveness and FEDER funds through grant TIN2014-60404-JIN.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

Actors: The Ideal Abstraction for Programming Kernel-Based Concurrency

Author: Harvey Paul
Hentschel Kristian
Sventek Joseph
Publication venue: 'University of Glasgow'
Publication date: 01/01/2015
Field of study

GPU and multicore hardware architectures are commonly used in many different application areas to accelerate problem solutions relative to single CPU architectures. The typical approach to accessing these hardware architectures requires embedding logic into the programming language used to construct the application; the two primary forms of embedding are: calls to API routines to access the concurrent functionality, or pragmas providing concurrency hints to a language compiler such that particular blocks of code are targeted to the concurrent functionality. The former approach is verbose and semantically bankrupt, while the success of the latter approach is restricted to simple, static uses of the functionality. Actor-based applications are constructed from independent, encapsulated actors that interact through strongly-typed channels. This paper presents a first attempt at using actors to program kernels targeted at such concurrent hardware. Besides the glove-like fit of a kernel to the actor abstraction, quantitative code analysis shows that actor-based kernels are always significantly simpler than API-based coding, and generally simpler than pragma-based coding. Additionally, performance measurements show that the overheads of actor-based kernels are commensurate to API-based kernels, and range from equivalent to vastly improved for pragma-based annotations, both for sample and real-world applications

Enlighten