Search CORE

1,794 research outputs found

SL: a "quick and dirty" but working intermediate language for SVP systems

Author: Poss Raphael
Publication venue
Publication date: 01/01/2012
Field of study

The CSA group at the University of Amsterdam has developed SVP, a framework to manage and program many-core and hardware multithreaded processors. In this article, we introduce the intermediate language SL, a common vehicle to program SVP platforms. SL is designed as an extension to the standard C language (ISO C99/C11). It includes primitive constructs to bulk create threads, bulk synchronize on termination of threads, and communicate using word-sized dataflow channels between threads. It is intended for use as target language for higher-level parallelizing compilers. SL is a research vehicle; as of this writing, it is the only interface language to program a main SVP platform, the new Microgrid chip architecture. This article provides an overview of the language, to complement a detailed specification available separately.Comment: 22 pages, 3 figures, 18 listings, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Parallelization of a Six Degree of Freedom Entry Vehicle Trajectory Simulation Using OpenMP and OpenACC

Author: Green Justin S.
Gutierrez Julian
Williams R. Anthony
Publication venue
Publication date
Field of study

The art and science of writing parallelized software, using methods such as Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC), is dominated by computer scientists. Engineers and non-computer scientists looking to apply these techniques to their project applications face a steep learning curve, especially when looking to adapt their original single threaded software to run multi-threaded on graphics processing units (GPUs). There are significant changes in mindset that must occur; such as how to manage memory, the organization of instructions, and the use of if statements (also known as branching). The purpose of this work is twofold: 1) to demonstrate the applicability of parallelized coding methodologies, OpenMP and OpenACC, to tasks outside of the typical large scale matrix mathematics; and 2) to discuss, from an engineers perspective, the lessons learned from parallelizing software using these computer science techniques. This work applies OpenMP, on both multi-core central processing units (CPUs) and Intel Xeon Phi 7210, and OpenACC on GPUs. These parallelization techniques are used to tackle the simulation of thousands of entry vehicle trajectories through the integration of six degree of freedom (DoF) equations of motion (EoM). The forces and moments acting on the entry vehicle, and used by the EoM, are estimated using multiple models of varying levels of complexity. Several benchmark comparisons are made on the execution of six DoF trajectory simulation: single thread Intel Xeon E5-2670 CPU, multi-thread CPU using OpenMP, multi-thread Xeon Phi 7210 using OpenMP, and multi-thread NVIDIA Tesla K40 GPU using OpenACC. These benchmarks are run on the Pleiades Supercomputer Cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC), and a Xeon Phi 7210 node at NASA Langley Research Center (LaRC)

NASA Technical Reports Server

Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems

Author: Ancourt Corinne
Jouvelot Pierre
Khaldi Dounia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

International audienceWe introduce a new parallelization framework for scientific computing based on BDSC, an efficient automatic scheduling algorithm for parallel programs in the presence of resource constraints on the number of processors and their local memory size. BDSC extends Yang and Gerasoulis's Dominant Sequence Clus-tering (DSC) algorithm; it uses sophisticated cost models and addresses both shared and distributed parallel memory architectures. We describe BDSC, its integration within the PIPS compiler infrastructure and its application to the parallelization of four well-known scientific applications: Harris, ABF, equake and IS. Our experiments suggest that BDSC's focus on efficient resource man-agement leads to significant parallelization speedups on both shared and dis-tributed memory systems, improving upon DSC results, as shown by the com-parison of the sequential and parallelized versions of these four applications running on both OpenMP and MPI frameworks

HAL Descartes

HAL-MINES ParisTech

Parallel machine architecture and compiler design facilities

Author: Kuck David J.
Padua David
Sameh Ahmed
Veidenbaum Alex
Yew Pen-Chung
Publication venue
Publication date
Field of study

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role

NASA Technical Reports Server

parMERASA Multi-Core Execution of Parallelised Hard Real-Time Applications Supporting Analysability

Author: Abella Jaume
Bonenfant Armelle
Bradatsch Christian
Broster Ian
Böddeker Bert
Cassé Hugues
Cazorla Francisco
Fernandes Joao
George David
Gerdes Mike
Hugl Andreas
Jahr Ralf
Kehr Sebastian
Kluge Florian
Lay Nick
Mische Jörg
Ozaktas Haluk
Panic Milos
Petrov Zlatko
Pyka Arthur
Quinones Eduardo
Regler Hans
Rochange Christine
Rohde Mathias
Sainrat Pascal
Uhrig Sascha
Ungerer Theo
Zaykov Pavel G.
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceEngineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores

OPUS Augsburg

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

TRACO: Source-to-Source Parallelizing Compiler

Author: Bielecki Wlodzimierz
Palkowski Marek
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/02/2017
Field of study

The paper presents a source-to-source compiler, TRACO, for automatic extraction of both coarse- and fine-grained parallelism available in C/C++ loops. Parallelization techniques implemented in TRACO are based on the transitive closure of a relation describing all the dependences in a loop. Coarse- and fine-grained parallelism is represented with synchronization-free slices (space partitions) and a legal loop statement instance schedule (time partitions), respectively. TRACO enables also applying scalar and array variable privatization as well as parallel reduction. On its output, TRACO produces compilable parallel OpenMP C/C++ and/or OpenACC C/C++ code. The effectiveness of TRACO, efficiency of parallel code produced by TRACO, and the time of parallel code production are evaluated by means of the NAS Parallel Benchmark and Polyhedral Benchmark suites. These features of TRACO are compared with closely related compilers such as ICC, Pluto, Par4All, and Cetus. Feature work is outlined

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)