3,198 research outputs found
Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation
Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent
a powerful and affordable tool for scientists who look to speed up simulations
of complex systems. However, porting code to such devices requires a detailed
understanding of heterogeneous programming tools and effective strategies for
parallelization. In this paper we present a source to source compilation
approach with whole-program analysis to automatically transform single-threaded
FORTRAN 77 legacy code into OpenCL-accelerated programs with parallelized
kernels.
The main contributions of our work are: (1) whole-source refactoring to allow
any subroutine in the code to be offloaded to an accelerator. (2) Minimization
of the data transfer between the host and the accelerator by eliminating
redundant transfers. (3) Pragmatic auto-parallelization of the code to be
offloaded to the accelerator by identification of parallelizable maps and
reductions.
We have validated the code transformation performance of the compiler on the
NIST FORTRAN 78 test suite and several real-world codes: the Large Eddy
Simulator for Urban Flows, a high-resolution turbulent flow model; the shallow
water component of the ocean model Gmodel; the Linear Baroclinic Model, an
atmospheric climate model and Flexpart-WRF, a particle dispersion simulator.
The automatic parallelization component has been tested on as 2-D Shallow
Water model (2DSW) and on the Large Eddy Simulator for Urban Flows (UFLES) and
produces a complete OpenCL-enabled code base. The fully OpenCL-accelerated
versions of the 2DSW and the UFLES are resp. 9x and 20x faster on GPU than the
original code on CPU, in both cases this is the same performance as manually
ported code.Comment: 12 pages, 5 figures, submitted to "Computers and Fluids" as full
paper from ParCFD conference entr
3D cut-cell modelling for high-resolution atmospheric simulations
Owing to the recent, rapid development of computer technology, the resolution
of atmospheric numerical models has increased substantially. With the use of
next-generation supercomputers, atmospheric simulations using horizontal grid
intervals of O(100) m or less will gain popularity. At such high resolution
more of the steep gradients in mountainous terrain will be resolved, which may
result in large truncation errors in those models using terrain-following
coordinates. In this study, a new 3D Cartesian coordinate non-hydrostatic
atmospheric model is developed. A cut-cell representation of topography based
on finite-volume discretization is combined with a cell-merging approach, in
which small cut-cells are merged with neighboring cells either vertically or
horizontally. In addition, a block-structured mesh-refinement technique is
introduced to achieve a variable resolution on the model grid with the finest
resolution occurring close to the terrain surface. The model successfully
reproduces a flow over a 3D bell-shaped hill that shows a good agreement with
the flow predicted by the linear theory. The ability of the model to simulate
flows over steep terrain is demonstrated using a hemisphere-shaped hill where
the maximum slope angle is resolved at 71 degrees. The advantage of a locally
refined grid around a 3D hill, with cut-cells at the terrain surface, is also
demonstrated using the hemisphere-shaped hill. The model reproduces smooth
mountain waves propagating over varying grid resolution without introducing
large errors associated with the change of mesh resolution. At the same time,
the model shows a good scalability on a locally refined grid with the use of
OpenMP.Comment: 19 pages, 16 figures. Revised version, accepted for publication in
QJRM
Non-Local Compressive Sensing Based SAR Tomography
Tomographic SAR (TomoSAR) inversion of urban areas is an inherently sparse
reconstruction problem and, hence, can be solved using compressive sensing (CS)
algorithms. This paper proposes solutions for two notorious problems in this
field: 1) TomoSAR requires a high number of data sets, which makes the
technique expensive. However, it can be shown that the number of acquisitions
and the signal-to-noise ratio (SNR) can be traded off against each other,
because it is asymptotically only the product of the number of acquisitions and
SNR that determines the reconstruction quality. We propose to increase SNR by
integrating non-local estimation into the inversion and show that a reasonable
reconstruction of buildings from only seven interferograms is feasible. 2)
CS-based inversion is computationally expensive and therefore barely suitable
for large-scale applications. We introduce a new fast and accurate algorithm
for solving the non-local L1-L2-minimization problem, central to CS-based
reconstruction algorithms. The applicability of the algorithm is demonstrated
using simulated data and TerraSAR-X high-resolution spotlight images over an
area in Munich, Germany.Comment: 10 page
FullSWOF_Paral: Comparison of two parallelization strategies (MPI and SKELGIS) on a software designed for hydrology applications
In this paper, we perform a comparison of two approaches for the
parallelization of an existing, free software, FullSWOF 2D (http://www.
univ-orleans.fr/mapmo/soft/FullSWOF/ that solves shallow water equations for
applications in hydrology) based on a domain decomposition strategy. The first
approach is based on the classical MPI library while the second approach uses
Parallel Algorithmic Skeletons and more precisely a library named SkelGIS
(Skeletons for Geographical Information Systems). The first results presented
in this article show that the two approaches are similar in terms of
performance and scalability. The two implementation strategies are however very
different and we discuss the advantages of each one.Comment: 27 page
Recommended from our members
Impacts of aerosols and clouds on photolysis frequencies and photochemistry during TRACE-P: 2. Three-dimensional study using a regional chemical transport model
NEMO-Med: Optimization and Improvement of Scalability
The NEMO oceanic model is widely used among the climate community. It is used with different configurations in more than 50 research projects for both long and short-term simulations. Computational requirements of the model and its implementation limit the exploitation of the emerging computational infrastructure at peta and exascale. A deep revision and analysis of the model and its implementation were needed. The paper describes the performance evaluation of the model (v3.2), based on MPI parallelization, on the MareNostrum platform at the Barcelona Supercomputing Centre. The analysis of the scalability has been carried out taking into account different factors, such as the I/O system available on the platform, the domain decomposition of the model and the level of the parallelism. The analysis highlighted different bottlenecks due to the communication overhead. The code has been optimized reducing the communication weight within some frequently called functions and the parallelization has been improved introducing a second level of parallelism based on the OpenMP shared memory paradigm
- …