Search CORE

4 research outputs found

ACOTES project: Advanced compiler technologies for embedded streaming

Author: Albert Cohen
Alex Ramírez
Andrea Ornstein
Antoniu Pop
Ayal Zaks
Cupertino Miranda
Cédric Bastoul
David Ródenas
Dorit Nuzman
E. Blossom
E.A. Lee
Eduard Ayguadé
Erven Rohou
Harm Munk
Ira Rosen
J. Hoogerbrugge
Konrad Trifunović
Louis-Noël Pouchet
M. Gschwind
M. Wolfe
Marc Duranton
Marco Cornero
Menno Lindwer
Mohammed Fellahi
Paul Carpenter
Philippe Dumont
R. Allen
R.G. Scarborough
Razya Ladelsky
Roger Ferrer
S. Campanoni
Sebastian Pop
Uzi Shvadron
Xavier Martorell
Zbigniew Chamski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.Peer ReviewedPostprint (published version

HAL-CentraleSupelec

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

HAL-MINES ParisTech

The University of Manchester - Institutional Repository

HAL-Rennes 1

Advances in Parallel-Stage Decoupled Software Pipelining Leveraging Loop Distribution, Stream-Computing and the SSA Form

Author: Cohen Albert
Li Feng
Pop Antoniu
Publication venue: Florent Bouchez and Sebastian Hack and Eelco Visser
Publication date: 02/04/2011
Field of study

8 pages Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors-Compilers, OptimizationInternational audienceDecoupled Software Pipelining (DSWP) is a program partitioning method enabling compilers to extract pipeline parallelism from sequential programs. Parallel Stage DSWP (PS-DSWP) is an extension that also exploits the data parallelism within pipeline filters. This paper presents the preliminary design of a new PS-DSWP method capable of handling arbitrary structured control flow, a slightly better algorithmic complexity, the natural exploitation of nested parallelism with communications across arbitrary levels, with a seamless integration with data-flow parallel programming environments. It is inspired by loop-distribution and supports nested/structured partitioning along with the hierarchy of control dependences. The method relies on a data-flow streaming extension of OpenMP. These advances are made possible thanks to progresses in compiler intermediate representation. We describe our usage of the Static Single Assignment (SSA) form, how we extend it to the context of concurrent streaming tasks, and we discuss the benefits and challenges for PS-DSWP

INRIA a CCSD electronic archive server

HAL-MINES ParisTech

Advances in Parallel-Stage Decoupled Software Pipelining

Author: Antoniu Pop
Cohen Albert
Li Feng
Publication venue: HAL CCSD
Publication date: 02/04/2011
Field of study

Decoupled Software Pipelining (DSWP) is a program partitioning method enabling compilers to extract pipeline parallelism from sequential programs. Parallel Stage DSWP (PS-DSWP) is an extension that also exploits the data parallelism within pipeline ﬁlters. This paper presents the preliminary design of a new PS-DSWP method capable of handling arbitrary structured control ﬂow, a slightly better algorithmic complexity, the natural exploitation of nested parallelism with communications across arbitrary levels, with a seamless integration with data-ﬂow parallel programming environments. It is inspired by loop-distribution and supports nested/structured partitioning along with the hierarchy of control dependences. The method relies on a data-ﬂow streaming extension of OpenMP. These advances are made possible thanks to progresses in compiler intermediate representation. We describe our usage of the Static Single Assignment (SSA) form, how we extend it to the context of concurrent streaming tasks, and we discuss the beneﬁts and challenges for PS-DSWP

INRIA a CCSD electronic archive server

Efficient Evaluation of Data-intensive Batch-queries in Open Simulation Laboratories

Author: Kanov Kalin Nikolaev
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 15/12/2016
Field of study

Better instruments, faster and bigger supercomputers and easier collaboration and sharing of data in the sciences have introduced the need to manage increasingly large datasets. Advances in high-performance computing (HPC) have empowered many science disciplines' computational branches. However, many scientists lack access to HPC facilities or the necessary sophistication to develop and run HPC codes. The benefits of testing new theories and experimenting with large numerical simulations have thus been restricted to a few top users. In this dissertation, I describe the ``remote immersive analysis" approach to computational science and present new techniques and methods for the efficient evaluation of scientific analysis tasks in analysis cluster environments. I will discuss several techniques developed for the efficient evaluation of data-intensive batch-queries in large numerical simulation databases. An I/O streaming method for the evaluation of decomposable kernel computations utilizes partial-sums to evaluate a batch query by performing a single sequential pass over the data. Spatial filtering computations, which use a box filter, share not only data, but also computation and can be evaluated over an intermediate summed volumes dataset derived from the original data. This is more efficient for certain workloads even when the intermediate dataset is computed dynamically. Threshold queries have immense data requirements and potentially operate over entire time-steps of the simulation. An efficient and scalable data-parallel approach evaluates threshold queries of fields derived from the raw simulation data and stores their results in an application-aware semantic cache for fast subsequent retrieval. Finally, synchronization at a mediator, task parallel and data-parallel approaches for the evaluation of particle tracking queries are compared and examined. These techniques are developed, deployed and evaluated in the Johns Hopkins Turbulence Databases (JHTDB), an open simulation laboratory for turbulence research. The JHTDB stores the output of world-class numerical simulations of turbulence and provides public access to and means to explore their complete space-time history. The techniques discussed implement core scientific analysis routines and significantly increase the utility of the service. Additionally, they improve the performance of these routines by up-to an order of magnitude or more when compared with direct implementations or implementations adapted from the simulation code

JScholarship