1 research outputs found
FAMOUS, faster: using parallel computing techniques to accelerate the FAMOUS/HadCM3 climate model with a focus on the radiative transfer algorithm
We have optimised the atmospheric radiation algorithm
of the FAMOUS climate model on several hardware
platforms. The optimisation involved translating the Fortran
code to C and restructuring the algorithm around the
computation of a single air column. Instead of the existing
MPI-based domain decomposition, we used a task queue
and a thread pool to schedule the computation of individual
columns on the available processors. Finally, four air
columns are packed together in a single data structure and
computed simultaneously using Single Instruction Multiple
Data operations.
The modified algorithm runs more than 50 times faster on
the CELL’s Synergistic Processing Elements than on its main
PowerPC processing element. On Intel-compatible processors,
the new radiation code runs 4 times faster. On the tested
graphics processor, using OpenCL, we find a speed-up of
more than 2.5 times as compared to the original code on the
main CPU. Because the radiation code takes more than 60%
of the total CPU time, FAMOUS executes more than twice as
fast. Our version of the algorithm returns bit-wise identical
results, which demonstrates the robustness of our approach.
We estimate that this project required around two and a half
man-years of work