Search CORE

48 research outputs found

A Team-Based Methodology of Memory Hierarchy-Aware Runtime Support in Coarray Fortran

Author: Chapman Barbara
Eachempati Deepak
Ge Shiyao
Jouvelot Pierre
Khaldi Dounia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/01/2015
Field of study

International audience—We describe how 2-level memory hierarchies can be exploited to optimize the implementation of teams in the parallel facet of the upcoming Fortran 2015 standard. We focus on reducing the cost associated with moving data within a computing node and between nodes, finding that this distinction is of key importance when looking at performance issues. We introduce a new hardware-aware approach for PGAS, to be used within a runtime system, to optimize the communications in the virtual topologies and clusters that are binding different teams together. We have applied, and implemented into the CAF OpenUH compiler, this methodology to three important collective operations, namely barrier, all-to-all reduction and one-to-all broadcast; this is the first Fortran compiler that both provides teams and handles such a memory hierarchy methodology within teams I. INTRODUCTION The emergence of many-core computing nodes in large-scale distributed systems requires programming model im-plementers to consider more carefully memory hierarchy when looking at performance issues. Most parallel applications are programmed using the Message Passing Interface (MPI) [1], where multiple processes execute in a coordinated manner, communicating by performing send and receive operations. More recently, several languages and libraries have added support for explicit or implicit remote memory access (RMA) using so-called " one-sided communication " , including languages following the Partitioned Global Address Space (PGAS) paradigm as well as MPI (MPI-2 added RMA to the interface and MPI-3 made significant refinements to better support it). Of special note is the Fortran 2008 addition for supporting coarrays, a language mechanism that enables RMA as a natural extension to Fortran's array syntax, informally named CAF 1. In this paradigm, an image is an executing process in a Single Program Multiple Data (SPMD) program with its own cop

CiteSeerX

HAL Descartes

HAL-MINES ParisTech

A framework for unit testing with coarray Fortran

Author: Abdullahi Hassan Ambra
Cardellini Valeria
Filippone Salvatore
Publication venue: The Society for Modeling and Simulation International
Publication date: 26/04/2017
Field of study

Parallelism is a ubiquitous feature of modern computing architectures; indeed, we might even say that serial code is now automatically legacy code. Writing parallel code poses significant challenges to programs, and is often error-prone. Partitioned Global Address Space (PGAS) languages, such as Coarray Fortran (CAF), represent a promising development direction in the quest for a trade-off between simplicity and performance. CAF is a parallel programming model that allows a smooth migration from serial to parallel code. However, despite CAF simplicity, refactoring serial code and migrating it to parallel versions is still error-prone, especially in complex softwares. The combination of unit testing, which drastically reduces defect injection, and CAF is therefore a very appealing prospect; however, it requires appropriate tools to realize its potential. In this paper, we present the first CAF-compatible framework for unit tests, developed as an extension to the Parallel Fortran Unit Test framework (pFUnit)

Cranfield CERES

ART

The seven ages of Fortran

Author: Metcalf Michael
Publication venue
Publication date: 01/04/2011
Field of study

When IBM's John Backus first developed the Fortran programming language, back in 1957, he certainly never dreamt that it would become a world-wide success and still be going strong many years later. Given the oft-repeated predictions of its imminent demise, starting around 1968, it is a surprise, even to some of its most devoted users, that this much-maligned language is not only still with us, but is being further developed for the demanding applications of the future. What has made this programming language succeed where most slip into oblivion? One reason is certainly that the language has been regularly standardized. In this paper we will trace the evolution of the language from its first version and though six cycles of formal revision, and speculate on how this might continue. Now, modern Fortran is a procedural, imperative, compiled language with a syntax well suited to a direct representation of mathematical formulas. Individual procedures may be compiled separately or grouped into modules, either way allowing the convenient construction of very large programs and procedure libraries. Procedures communicate via global data areas or by argument association. The language now contains features for array processing, abstract data types, dynamic data structures, objectoriented programming and parallel processing.Facultad de Informátic

The seven ages of Fortran

Author: Metcalf Michael
Publication venue
Publication date: 30/03/2011
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Servicio de Difusión de la Creación Intelectual

Application of Modern Fortran to Spacecraft Trajectory Design and Optimization

Author: Beekman Izaak B.
Falck Robert D.
Williams Jacob
Publication venue
Publication date
Field of study

In this paper, applications of the modern Fortran programming language to the field of spacecraft trajectory optimization and design are examined. Modern object-oriented Fortran has many advantages for scientific programming, although many legacy Fortran aerospace codes have not been upgraded to use the newer standards (or have been rewritten in other languages perceived to be more modern). NASA's Copernicus spacecraft trajectory optimization program, originally a combination of Fortran 77 and Fortran 95, has attempted to keep up with modern standards and makes significant use of the new language features. Various algorithms and methods are presented from trajectory tools such as Copernicus, as well as modern Fortran open source libraries and other projects

NASA Technical Reports Server

XcalableMP PGAS Programming Language

Author: Sato Mitsuhisa
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

XcalableMP is a directive-based parallel programming language based on Fortran and C, supporting a Partitioned Global Address Space (PGAS) model for distributed memory parallel systems. This open access book presents XcalableMP language from its programming model and basic concept to the experience and performance of applications described in XcalableMP.　 XcalableMP was taken as a parallel programming language project in the FLAGSHIP 2020 project, which was to develop the Japanese flagship supercomputer, Fugaku, for improving the productivity of parallel programing. XcalableMP is now available on Fugaku and its performance is enhanced by the Fugaku interconnect, Tofu-D. The global-view programming model of XcalableMP, inherited from High-Performance Fortran (HPF), provides an easy and useful solution to parallelize data-parallel programs with directives for distributed global array and work distribution and shadow communication. The local-view programming adopts coarray notation from Coarray Fortran (CAF) to describe explicit communication in a PGAS model. The language specification was designed and proposed by the XcalableMP Specification Working Group organized in the PC Consortium, Japan. The Omni XcalableMP compiler is a production-level reference implementation of XcalableMP compiler for C and Fortran 2008, developed by RIKEN CCS and the University of Tsukuba. The performance of the XcalableMP program was used in the Fugaku as well as the K computer. A performance study showed that XcalableMP enables a scalable performance comparable to the message passing interface (MPI) version with a clean and easy-to-understand programming style requiring little effort

OAPEN Library

DART-MPI: An MPI-based Implementation of a PGAS Runtime System

Author: Fürlinger Karl
Glass Colin W.
Gracia José
Idrees Kamran
Mhedheb Yousri
Tao Jie
Zhou Huan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

arXiv.org e-Print Archive

Crossref

Proceedings of the 7th International Conference on PGAS Programming Models

Author
Publication venue: University of Edinburgh
Publication date: 29/10/2013
Field of study

Edinburgh Research Explorer

Function Shipping in a Scalable Parallel Programming Model

Author: Yang Chaoran
Publication venue
Publication date: 01/01/2012
Field of study

Increasingly, a large number of scientific and technical applications exhibit dynamically generated parallelism or irregular data access patterns. These applications pose significant challenges to achieving scalable performance on large scale parallel systems. This thesis explores the advantages of using function shipping as a language level primitive to help simplify writing scalable irregular and dynamic parallel applications. Function shipping provides a mechanism to avoid exposing latency, by enabling users ship data and computation together to a remote worker for execution. In the context of the Coarray Fortran 2.0 Partitioned Global Address Space language, we implement function shipping and the finish synchronization construct, which ensures global completion of a set of shipped function instances. We demonstrate the usability and performance benefits of using function shipping with several benchmarks. Experiments on emerging supercomputers show that function shipping is useful and effective in achieving scalable performance with dynamic and irregular algorithms

DSpace at Rice University

GLB: Lifeline-based Global Load Balancing library in X10

Author: Grove David
Herta Benjamin
Kamada Tomio
Saraswat Vijay
Takeuchi Mikio
Tardieu Olivier
Zhang Wei
Publication venue
Publication date: 19/12/2013
Field of study

We present GLB, a programming model and an associated implementation that can handle a wide range of irregular paral- lel programming problems running over large-scale distributed systems. GLB is applicable both to problems that are easily load-balanced via static scheduling and to problems that are hard to statically load balance. GLB hides the intricate syn- chronizations (e.g., inter-node communication, initialization and startup, load balancing, termination and result collection) from the users. GLB internally uses a version of the lifeline graph based work-stealing algorithm proposed by Saraswat et al. Users of GLB are simply required to write several pieces of sequential code that comply with the GLB interface. GLB then schedules and orchestrates the parallel execution of the code correctly and efficiently at scale. We have applied GLB to two representative benchmarks: Betweenness Centrality (BC) and Unbalanced Tree Search (UTS). Among them, BC can be statically load-balanced whereas UTS cannot. In either case, GLB scales well-- achieving nearly linear speedup on different computer architectures (Power, Blue Gene/Q, and K) -- up to 16K cores

arXiv.org e-Print Archive

CiteSeerX