Search CORE

13 research outputs found

CellSs : Scheduling techniques to better exploit memory hierarchy

Author: Badía Rosa Maria
Bellens Pieter
Cabarcas Jaramillo Felipe
Labarta Jesús
Perez Josep
Ramírez Alex
Publication venue: 'IOS Press'
Publication date: 01/01/2009
Field of study

ABSTRACT: Cell Superscalar’s (CellSs) main goal is to provide a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of the applications at a task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that takes care of the concurrent execution of the application. The first efforts for task scheduling in CellSs derived from very simple heuristics. This paper presents new scheduling techniques that have been developed for CellSs for the purpose of improving an application’s performance. Additionally, the design of a new scheduling algorithm is detailed and the algorithm evaluated. The CellSs scheduler takes an extension of the memory hierarchy for Cell/B.E. into account, with a cache memory shared between the SPEs. All new scheduling practices have been evaluated showing better behavior of our system

Directory of Open Access Journals

Biblioteca Digital del Sistema de Bibliotecas de la Universidad de Antioquia

CellSs: a Programming Model for the Cell BE Architecture

Author: Jesus Labarta
Josep M. Perez
Pieter Bellens
Rosa M. Badia
Publication venue: ACM
Publication date: 01/01/2006
Field of study

In this work we present Cell superscalar (CellSs) which addresses the automatic exploitation of the functional parallelism of a sequential program through the different processing elements of the Cell BE architecture. The focus in on the simplicity and flexibility of the programming model. Based on a simple annotation of the source code, a source to source compiler generates the necessary code and a runtime library exploits the existing parallelism by building at runtime a task dependency graph. The runtime takes care of the task scheduling and data handling between the different processors of this heterogeneous architecture. Besides, a locality-aware task scheduling has been implemented to reduce the overhead of data transfers. The approach has been implemented and tested with a set of examples and the results obtained since now are promising

CiteSeerX

Crossref

Making the best of temporal locality: Just-in-time renaming and lazy write-back on the cell/B.E

Author: Badia Rosa M.
Bellens Pieter
Labarta Jesus
Perez Josep M.
Publication venue: 'SAGE Publications'
Publication date: 19/10/2016
Field of study

Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the concurrent execution of the application. We introduce a technique called bypassing that allows CellSs to perform core-to-core Direct Memory Access (DMA) transfers for generic applications. In this review we concisely summarize the bypassing practice and introduce two improvements: just-in-time renaming and lazy write-back. These extensions come at no additional cost and potentially increase performance by improving the perceived bandwidth of the Element Interconnect Bus (EIB). Experiments on five fundamental linear algebra kernels demonstrate the applicability of these techniques and quantify the benefit that can be reaped. We also present performance results for a first prototype of CellSs with bypassing. © The Author(s) 2010.Peer Reviewe

Digital.CSIC

Just-in-time renaming and lazy write-back on the Cell/B.E.

Author: Badia Sala Rosa Maria
Bellens Pieter
Labarta Mancho Jesús José
Pérez Cáncer Josep Maria
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the concurrent execution of the application. We have developed a technique called bypassing that allows CellSs to perform core-to-core DMA transfers for generic applications. In this overview paper we concisely summarise the bypassing practice and introduce two improvements: just-in-time renaming and lazy write-back. These extensions come at no additional cost and potentially increase performance by improving the perceived bandwidth of the Element Interconnect Bus (EIB). Although the integration of bypassing with CellSs is work in progress we present results for four fundamental linear algebra kernels to demonstrate the applicability of these techniques and quantify the benefit that can be reaped.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Just-in-time renaming and lazy write-back on the Cell/B.E.

Author: Badia Sala Rosa Maria
Bellens Pieter
Labarta Mancho Jesús José
Pérez Cáncer Josep Maria
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the concurrent execution of the application. We have developed a technique called bypassing that allows CellSs to perform core-to-core DMA transfers for generic applications. In this overview paper we concisely summarise the bypassing practice and introduce two improvements: just-in-time renaming and lazy write-back. These extensions come at no additional cost and potentially increase performance by improving the perceived bandwidth of the Element Interconnect Bus (EIB). Although the integration of bypassing with CellSs is work in progress we present results for four fundamental linear algebra kernels to demonstrate the applicability of these techniques and quantify the benefit that can be reaped.Peer Reviewe

RECERCAT

Les in lezen : verticale en horizontale analyse van de Lezen-op-School-projecten: rapport onderzoekslijn 2

Author: Bellens Kim
Casteleyn Jordi
De Clerck Stefan
Maesen Eva
Schelfhout Wouter
Simons Mathea
Smits Tom
Van Mieghem Aster
Verachtert Pieter
Publication venue
Publication date: 01/01/2023
Field of study

Institutional Repository Universiteit Antwerpen

Les in lezen : inspiratiegids voor effectief leesonderwijs in het kleuter-, lager en secundair onderwijs

Author: Bellens Kim
Casteleyn Jordi
Geudens Astrid
Schraeyen Kirsten
Simons Mathea
Smits Tom
Taelman Helena
Tiebout Karlien
Trioen Marit
Verachtert Pieter
Publication venue
Publication date: 01/01/2023
Field of study

Institutional Repository Universiteit Antwerpen

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

Author: Ayguadé Parra Eduard
Badia Sala Rosa Maria
Bellens Pieter
Duran Gonzalez Alejandro
Ferrer Roger
González Tallada Marc
Labarta Mancho Jesús José
Martorell Bofill Xavier
Planas Carbonell Judit
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmer’s productivity.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

Author: Ayguadé Parra Eduard
Badia Sala Rosa Maria
Bellens Pieter
Duran Gonzalez Alejandro
Ferrer Roger
González Tallada Marc
Labarta Mancho Jesús José
Martorell Bofill Xavier
Planas Carbonell Judit
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date
Field of study

RECERCAT

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

Author: Ayguadé Parra Eduard
Badia Sala Rosa Maria
Bellens Pieter
Duran González Alejandro
Ferrer Roger
González Tallada Marc
Labarta Mancho Jesús José
Martorell Bofill Xavier
Planas Carbonell Judit
Publication venue
Publication date
Field of study

RECERCAT