Search CORE

33 research outputs found

A Compiler and Runtime Infrastructure for Automatic Program Distribution

Author: Chu Matt
Diaconescu Roxana E.
Mouri Zachary
Wang Lei
Publication venue
Publication date: 01/04/2005
Field of study

This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction. We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system

Caltech Authors

Performance Evaluation of Automatically Generated Data Parallel Programs

Author: Mahéo Yves
Massari Luisa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

International audienceIn this paper, the problem of evaluating the performance of parallel programs generated by data parallel compilers is studied. These compilers take as input an application written in a sequential language augmented with data distribution directives and produce a parallel version based on the specifed partitioning of data. A methodology for evaluating the relationships existing among the program characteristics, the data distribution adopted, and the performance indices measured during the program execution is described. It consists of three phases: a "static" description of the program under study, a "dynamic" description, based on the measurement and the analysis of its execution on a real system, and the construction of a workload model, by using workload characterization techniques. Following such a methodology, decisions related to the selection of the data distribution to be adopted can be facilitated. The approach is exposed through the use of the Pandore environment, designed for the execution of sequential programs on distributed memory parallel computers. It is composed of a compiler, a runtime system and tools for trace and profile generation. The results of an experiment explaining the methodology are presented

HAL-CentraleSupelec

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

INRIA a CCSD electronic archive server

HAL-Rennes 1

A Survey on Parallel Architecture and Parallel Programming Languages and Tools

Author: C.Namrata Mahender
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/03/2014
Field of study

In this paper, we have presented a brief review on the evolution of parallel computing to multi - core architecture. The survey briefs more than 45 languages, libraries and tools used till date to increase performance through parallel programming. We ha ve given emphasis more on the architecture of parallel system in the survey

International Journal on Recent and Innovation Trends in Computing and Communication

Lessons from Discarded Computer Architectures

Author: E. Smirni
J. Coll
J. Nichol
J. Nichol
M. Petersen
S. Hiranandani
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Compiling machine-independent parallel programs

Author: Heinz Ernst A.
Lukowicz Paul
Philippsen Michael
Publication venue: Association for Computing Machinery
Publication date: 02/08/2007
Field of study

KITopen

Automatic Runtime Calculation of Communications for Data-Parallel Expressions with Periodic Conditions

Author: Boehm
Ebbinghaus
El-Ghazawi
Gonzalez-Escribano
Gonzalez-Escribano
Halstead
Hiranandani
Kwon
McCabe
Mehta
Moreton-Fernandez
Planas
Randall
Stepanov
Upadrasta
Verdoolaege
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Producción CientíficaMany real-world applications feature data accesses on periodic domains. Manually implementing the synchronizations and communications associated to the data dependences on each case is cumbersome and error-prone. It is increasingly interesting to support these applications in high-level parallel programming languages or parallelizing compilers. In this paper, we present a technique that, for distributed-memory systems, calculates the specific communications derived from data-parallel codes with or without periodic boundary conditions on affine access expressions. It makes transparent to the programmer the management of aggregated communications for the chosen data partition. Our technique moves to runtime part of the compile-time analysis typically used to generate the communication code for affine expressions, introducing a complete new technique that also supports the periodic boundary conditions. We present an experimental study to evaluate our proposal using several study cases. Our experimental results show that our approach can automatically obtain communication codes as efficient as those found in MPI reference codes, reducing the development effort.2019-01-01MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H6 Network (TIN2016-81840-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS). By the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositorio Documental de la Universidad de Valladolid

ARTICLE NO. PC971367 A Library-Based Approach to Task Parallelism in a Data-Parallel Language

Author: Alok Choudhary
David R. Kohr
Ian Foster
Rakesh Krishnaiyer
Publication venue
Publication date
Field of study

Pure data-parallel languages such as High Performance Fortran version 1 (HPF) do not allow efficient expression of mixed task/data-parallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how these common parallel program structures can be represented, with only minor extensions to the HPF model, by using a coordination library based on the Message Passing Interface (MPI). This library allows data-parallel tasks to exchange distributed data structures using calls to simple communication functions. We present microbenchmark results that characterize the performance of this library and that quantify the impact of optimizations that allow reuse of communication schedules in common situations. In addition, results from two-dimensional FFT, convolution, and multiblock programs demonstrate that the HPF/ MPI library can provide performance superior to that of pure HPF. We conclude that this synergistic combination of two parallel programming standards represents a useful approach to task parallelism in a data-parallel framework, increasing the range of problems addressable in HPF without requiring complex compile

CiteSeerX

Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC

Author: Brisolara Lisane
Carro Luigi
Chae Soo-Ik
Guerin Xavier
Han SangIl
Jerraya Ahmed
Reis Ricardo
Publication venue
Publication date: 01/01/2007
Field of study

Heterogeneous MPSoCs present unique opportunities for emerging embedded applications, which require both high-performance and programmability. Although, software programming for these MPSoC architectures requires tedious and error-prone tasks, thereby automatic code generation tools are required. A code generation method based on fine-grain specification can provide more design space and optimization opportunities, such as exploiting fine-level parallelism and more efficient partitions. However, when partitioned, fine-grain models may require a large number of inter-processor communications, decreasing the overall system performance. This paper presents a Simulink-based multithread code generation method, which applies Message Aggregation optimization technique to reduce the number of interprocessor communications. This technique reduces the communication overheads in terms of execution time by reduction on the number of messages exchanged and in terms of memory size by the reduction on the number of channels. The paper also presents experiment results for one multimedia application, showing performance improvements and memory reduction obtained with Message Aggregation technique

Crossref

SNU Open Repository and Archive

A framework for argument-based task synchronization with automatic detection of dependencies

Author: Fraguela Basilio B.
González Carlos H.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

[Abstract] Synchronization in parallel applications can be achieved either implicitly or explicitly. Implicit synchronization is typical of programming environments that provide predefined, and often simple, patterns of parallelism such as data-parallel libraries and languages and skeletal operations. Nevertheless, more flexible approaches that allow to express arbitrary task-level parallel computations without a predefined structure request in turn that the user explicitly specifies the synchronization needed among the parallel tasks. In this paper we present a library-based approach that enables arbitrary patterns of parallelism with minimal effort for the user. Our proposal is the first generic approach to express parallelism we know of that requires neither explicit synchronizations nor a detail of the dependencies of the parallel tasks. Our strategy relies on expressing the parallel tasks as functions that convey their dependencies implicitly by means of their arguments. These function arguments are analyzed by our library, called DepSpawn, when a parallel task is spawned in order to enforce its dependencies. Our experiments indicate that DepSpawn is very competitive, both in terms of performance and programmability, with respect to a widespread high-level approach like OpenMP.Xunta de Galicia; INCITE08PXIB105161PRMinisterio de Ciencia e Innovación; TIN2010-16735Ministerio de Educación de España; AP2009-475

Repositorio da Universidade da Coruña

Crossref

A Compiler and Runtime Infrastructure for Automatic Program Distribution

Author
Publication venue: 'Defense Technical Information Center (DTIC)'
Publication date
Field of study

Crossref