Search CORE

3,198 research outputs found

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Author: Davidson Gavin
Vanderbauwhede Wim
Publication venue
Publication date: 13/11/2017
Field of study

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation approach with whole-program analysis to automatically transform single-threaded FORTRAN 77 legacy code into OpenCL-accelerated programs with parallelized kernels. The main contributions of our work are: (1) whole-source refactoring to allow any subroutine in the code to be offloaded to an accelerator. (2) Minimization of the data transfer between the host and the accelerator by eliminating redundant transfers. (3) Pragmatic auto-parallelization of the code to be offloaded to the accelerator by identification of parallelizable maps and reductions. We have validated the code transformation performance of the compiler on the NIST FORTRAN 78 test suite and several real-world codes: the Large Eddy Simulator for Urban Flows, a high-resolution turbulent flow model; the shallow water component of the ocean model Gmodel; the Linear Baroclinic Model, an atmospheric climate model and Flexpart-WRF, a particle dispersion simulator. The automatic parallelization component has been tested on as 2-D Shallow Water model (2DSW) and on the Large Eddy Simulator for Urban Flows (UFLES) and produces a complete OpenCL-enabled code base. The fully OpenCL-accelerated versions of the 2DSW and the UFLES are resp. 9x and 20x faster on GPU than the original code on CPU, in both cases this is the same performance as manually ported code.Comment: 12 pages, 5 figures, submitted to "Computers and Fluids" as full paper from ParCFD conference entr

arXiv.org e-Print Archive

Crossref

Enlighten

3D cut-cell modelling for high-resolution atmospheric simulations

Author: Adcroft
Adcroft
Asselin
Baines
Bryan
Gal-Chen
Gallus
Good
Hanazaki
Hiroe Yamazaki
Jablonowski
Janjić
Jebens
Kim
Kirkpatrick
Klein
Klemp
Klemp
Klemp
Klemp
Leuenberger
Lin
Lock
McFarlane
Mesinger
Miyamoto
Nakahashi
Nikolaos Nikiforakis
Pember
Phillips
Quirk
Robert
Sakai
Satoh
Satomura
Satomura
Satomura
Schär
Semtner
Simmons
Smith
Smith
Steppeler
Steppeler
Steppeler
Steppeler
Takahashi
Takehiko Satomura
Thompson
Udaykumar
Walko
Yamazaki
Yamazaki
Yamazaki
Ye
Zängl
Zängl
Zängl
Publication venue: 'Wiley'
Publication date: 05/01/2016
Field of study

Owing to the recent, rapid development of computer technology, the resolution of atmospheric numerical models has increased substantially. With the use of next-generation supercomputers, atmospheric simulations using horizontal grid intervals of O(100) m or less will gain popularity. At such high resolution more of the steep gradients in mountainous terrain will be resolved, which may result in large truncation errors in those models using terrain-following coordinates. In this study, a new 3D Cartesian coordinate non-hydrostatic atmospheric model is developed. A cut-cell representation of topography based on finite-volume discretization is combined with a cell-merging approach, in which small cut-cells are merged with neighboring cells either vertically or horizontally. In addition, a block-structured mesh-refinement technique is introduced to achieve a variable resolution on the model grid with the finest resolution occurring close to the terrain surface. The model successfully reproduces a flow over a 3D bell-shaped hill that shows a good agreement with the flow predicted by the linear theory. The ability of the model to simulate flows over steep terrain is demonstrated using a hemisphere-shaped hill where the maximum slope angle is resolved at 71 degrees. The advantage of a locally refined grid around a 3D hill, with cut-cells at the terrain surface, is also demonstrated using the hemisphere-shaped hill. The model reproduces smooth mountain waves propagating over varying grid resolution without introducing large errors associated with the change of mesh resolution. At the same time, the model shows a good scalability on a locally refined grid with the use of OpenMP.Comment: 19 pages, 16 figures. Revised version, accepted for publication in QJRM

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Non-Local Compressive Sensing Based SAR Tomography

Author: Bamler Richard
Shi Yilei
Zhu Xiao Xiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/11/2018
Field of study

Tomographic SAR (TomoSAR) inversion of urban areas is an inherently sparse reconstruction problem and, hence, can be solved using compressive sensing (CS) algorithms. This paper proposes solutions for two notorious problems in this field: 1) TomoSAR requires a high number of data sets, which makes the technique expensive. However, it can be shown that the number of acquisitions and the signal-to-noise ratio (SNR) can be traded off against each other, because it is asymptotically only the product of the number of acquisitions and SNR that determines the reconstruction quality. We propose to increase SNR by integrating non-local estimation into the inversion and show that a reasonable reconstruction of buildings from only seven interferograms is feasible. 2) CS-based inversion is computationally expensive and therefore barely suitable for large-scale applications. We introduce a new fast and accurate algorithm for solving the non-local L1-L2-minimization problem, central to CS-based reconstruction algorithms. The applicability of the algorithm is demonstrated using simulated data and TerraSAR-X high-resolution spotlight images over an area in Munich, Germany.Comment: 10 page

arXiv.org e-Print Archive

Institute of Transport Research:Publications

FullSWOF_Paral: Comparison of two parallelization strategies (MPI and SKELGIS) on a software designed for hydrology applications

Author: Cordier Stéphane
Coullon Hélène
Delestre Olivier
Laguerre Christian
Le Minh Hoang
Pierre Daniel
Sadaka Georges
Publication venue: 'EDP Sciences'
Publication date: 18/07/2013
Field of study

In this paper, we perform a comparison of two approaches for the parallelization of an existing, free software, FullSWOF 2D (http://www. univ-orleans.fr/mapmo/soft/FullSWOF/ that solves shallow water equations for applications in hydrology) based on a domain decomposition strategy. The first approach is based on the classical MPI library while the second approach uses Parallel Algorithmic Skeletons and more precisely a library named SkelGIS (Skeletons for Geographical Information Systems). The first results presented in this article show that the two approaches are similar in terms of performance and scalability. The two implementation strategies are however very different and we discuss the advantages of each one.Comment: 27 page

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

HAL Descartes

Recommended from our members

Impacts of aerosols and clouds on photolysis frequencies and photochemistry during TRACE-P: 2. Three-dimensional study using a regional chemical transport model

Author: Anderson BE
Avery MA
Blake DR
Carmichael GR
Clarke AD
Huang H
Kurata G
Lefer B
Shetter RE
Tang YH
Uno I
Woo JH
Publication venue: eScholarship, University of California
Publication date: 01/01/2003
Field of study

eScholarship - University of California

NEMO-Med: Optimization and Improvement of Scalability

Author: ALOISIO Giovanni
EPICOCO Italo
Silvia Mocavero
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

The NEMO oceanic model is widely used among the climate community. It is used with different configurations in more than 50 research projects for both long and short-term simulations. Computational requirements of the model and its implementation limit the exploitation of the emerging computational infrastructure at peta and exascale. A deep revision and analysis of the model and its implementation were needed. The paper describes the performance evaluation of the model (v3.2), based on MPI parallelization, on the MareNostrum platform at the Barcelona Supercomputing Centre. The analysis of the scalability has been carried out taking into account different factors, such as the I/O system available on the platform, the domain decomposition of the model and the level of the parallelism. The analysis highlighted different bottlenecks due to the communication overhead. The code has been optimized reducing the communication weight within some frequently called functions and the parallelization has been improved introducing a second level of parallelism based on the OpenMP shared memory paradigm

Archivio Istituzionale della Ricerca- Università del Salento