Search CORE

84,557 research outputs found

The Design and Implementation of a Bytecode for Optimization on Heterogeneous Systems

Author: Seitz Kerry A
Publication venue: Digital Commons @ Trinity
Publication date: 20/04/2012
Field of study

As hardware architectures shift towards more heterogeneous platforms with different vari- eties of multi- and many-core processors and graphics processing units (GPUs) by various manufacturers, programmers need a way to write simple and highly optimized code without worrying about the specifics of the underlying hardware. To meet this need, I have designed a virtual machine and bytecode around the goal of optimized execution on highly variable, heterogeneous hardware, instead of having goals such as small bytecodes as was the ob- jective of the Java R Virtual Machine. The approach used here is to combine elements of the Dalvik R virtual machine with concepts from the OpenCL R heterogeneous computing platform, along with an annotation system so that the results of complex compile time analysis can be available to the Just-In-Time compiler. The annotation format is flexible so that the set of annotations can be expanded as the field of heterogeneous computing continues to grow. An initial implementation of this virtual machine was written in the Scala programming language and makes use of the Java bindings for OpenCL to execute code segments on a GPU. The implementation consists of an assembler that converts an assembly version of the bytecode into its binary representation and an interpreter that runs programs from the assembled binary. Because the bytecode contains valuable optimization information, decisions can be made at runtime to choose how best to execute code segments. To demonstrate this concept, the interpreter uses this information to produce OpenCL ker- nel code for specified bytecode blocks and then builds and executes these kernels to improve performance. This hybrid interpreter/Just-In-Time compiler serves as an initial implemen- tation of a virtual machine that provides optimized code tailored to the available hardware on which the application is running

Trinity University

LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

Author: Alvarez Carlos
Bautista Leonardo
Becker Tobias
Billung-Meyer Gunnar
Carpenter Paul
Christmann Wolfgang
Cristal Adrian
De La Cruz Raul
Dubhashi Devdatt
Etsion Yoav
Felber Pascal
Fetzer Christof
Gaydadjiev Georgi
Göttel Christian
Hadar Elad
Hagemeyer Jens
Jimenez Daniel
Jungeblut Thorsten
Kaiser Martin
Klawonn Frank
Krupop Stefan
Kucza Nils
Madonar Sergi
Martorell Xavier
Mihklafi Amani
Mudge Trevor
Mudge Trevor
Pasin Marcelo
Pericàs Miquel
Pnevmatikatos Dionisios N.
Porrmann Mario
Port Oron
Rocha Isabelly
Salami Behzad
Salomonsson Hans
Schiavoni Valerio
Trancoso Pedro
Unsal Osman S.
vor dem Berge Micha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Chalmers Research

Publications at Bielefeld University

Next Generation Cloud Computing: New Trends and Research Directions

Author: Buyya Rajkumar
Varghese Blesson
Publication venue
Publication date: 08/09/2017
Field of study

The landscape of cloud computing has significantly changed over the last decade. Not only have more providers and service offerings crowded the space, but also cloud infrastructure that was traditionally limited to single provider data centers is now evolving. In this paper, we firstly discuss the changing cloud infrastructure and consider the use of infrastructure from multiple providers and the benefit of decentralising computing away from data centers. These trends have resulted in the need for a variety of new computing architectures that will be offered by future cloud infrastructure. These architectures are anticipated to impact areas, such as connecting people and devices, data-intensive computing, the service space and self-learning systems. Finally, we lay out a roadmap of challenges that will need to be addressed for realising the potential of next generation cloud systems.Comment: Accepted to Future Generation Computer Systems, 07 September 201

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

Author: A Klöckner
D Charousset
G Agha
G Agha
J Nickolls
JD Owens
K Wu
L Dagum
S Srinivasan
S Wienke
T Desell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

arXiv.org e-Print Archive

Crossref

REPOSIT

Federated Embedded Systems – a review of the literature in related fields

Author: Axelsson Jakob
Kobetski Avenir
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2012
Field of study

This report is concerned with the vision of smart interconnected objects, a vision that has attracted much attention lately. In this paper, embedded, interconnected, open, and heterogeneous control systems are in focus, formally referred to as Federated Embedded Systems. To place FES into a context, a review of some related research directions is presented. This review includes such concepts as systems of systems, cyber-physical systems, ubiquitous computing, internet of things, and multi-agent systems. Interestingly, the reviewed fields seem to overlap with each other in an increasing number of ways

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive