Improving the performance of physics applications in atom-based clusters with rCUDA

Baydal Cardona, María Elvira; Prades, Javier; Reaño, Carlos; Silla, Federico

Improving the performance of physics applications in atom-based clusters with rCUDA

Authors: María Elvira Baydal Cardona
Javier Prades
Carlos Reaño
Federico Silla
Publication date: 1 March 2020
Publisher: 'Elsevier BV'
Doi

Abstract

[EN] Traditionally, High-Performance Computing (HPC) has been associated with large power requirements. The reason was that chip makers of the processors typically employed in HPC deployments have always focused on getting the highest performance from their designs, regardless of the energy their processors may consume. Actually, for many years only heat dissipation was the real barrier for achieving higher performance, at the cost of higher energy consumption. However, a new trend has recently appeared consisting on the use of low-power processors for HPC purposes. The MontBlanc and Isambard projects are good examples of this trend. These proposals, however, do not consider the use of GPUs. In this paper we propose to use GPUs in this kind of low-power processor based HPC deployments by making use of the remote GPU virtualization mechanism. To that end, we leverage the rCUDA middleware in a hybrid cluster composed of low-power Atom-based nodes and regular Xeon-based nodes equipped with GPUs. Our experiments show that, by making use of rCUDA, the execution time of applications belonging to the physics domain is noticeably reduced, achieving a speed up of up to 140x with just one remote NVIDIA V100 GPU with respect to the execution of the same applications using 8 Atom-based nodes. Additionally, a rough energy consumption estimation reports improvements in energy demands of up to 37x. (C) 2019 Elsevier Inc. All rights reserved.This work was funded by the Generalitat Valenciana, Spain under Grant PROMETEO/2017/077. Authors are also grateful for the generous support provided by Mellanox Technologies Inc.Silla, F.; Prades, J.; Baydal Cardona, ME.; Reaño, C. (2020). Improving the performance of physics applications in atom-based clusters with rCUDA. Journal of Parallel and Distributed Computing. 137:160-178. https://doi.org/10.1016/j.jpdc.2019.11.007S160178137R.E. Brown, E.R. Masanet, B. Nordman, W.F. Tschudi, A. Shehabi, J. Stanley, J.G. Koomey, D.A. Sartor, P.T. Chan, Report to congress on server and data center energy efficiency: public law 109-431, Berkeley, CA, 2008.G. Giunta, R. Montella, G. Agrillo, G. Coviello, A GPGPU transparent virtualization component for high performance computing clouds, in: Proc. of the Euro-Par Parallel Processing, Euro-Par, 2010, pp. 379–391.V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, P. Ranganathan, GViM: GPU-accelerated virtual machines, in: Proc. of the ACM Workshop on System-Level Virtualization for High Performance Computing, HPCVirt, 2009, pp. 17–24.J.A. Herdman, W.P. Gaudin, S. McIntosh-Smith, M. Boulton, D.A. Beckingsale, A.C. Mallinson, S.A. Jarvis, Accelerating hydrocodes with OpenACC, OpenCL and CUDA, in: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 2012.Koomey, J. G. (2008). Worldwide electricity used in data centers. Environmental Research Letters, 3(3), 034008. doi:10.1088/1748-9326/3/3/034008T.Y. Liang, Y.W. Chang, GridCuda: A grid-enabled CUDA programming toolkit, in: Proc. of the IEEE Advanced Information Networking and Applications Workshops, WAINA, 2011, pp. 141–146.Maqbool, J., Oh, S., & Fox, G. C. (2015). Evaluating ARM HPC clusters for scientific workloads. Concurrency and Computation: Practice and Experience, 27(17), 5390-5410. doi:10.1002/cpe.3602M. Martineau, S. McIntosh-Smith, Exploring on-node parallelism with neutral, a Monte Carlo neutral particle transport mini-app, in: 2017 IEEE International Conference on Cluster Computing, CLUSTER, 2017.M. Martineau, S. McIntosh-Smith, The arch project: physics mini-apps for algorithmic exploration and evaluating programming environments on HPC architectures, in: 2017 IEEE International Conference on Cluster Computing, CLUSTER, 2017.M. Martineau, S. McIntosh-Smith, M. Boulton, W. Gaudin, An evaluation of emerging many-core parallel programming models, in: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM’16, 2016.M. Oikawa, A. Kawai, K. Nomura, K. Yasuoka, K. Yoshikawa, T. Narumi, DS-CUDA: A middleware to use many GPUs in the cloud environment, in: Proc. of the SC Companion: High Performance Computing, Networking Storage and Analysis, SCC, 2012, pp. 1207–1214.Prades, J., Reaño, C., & Silla, F. (2018). On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines. Cluster Computing, 22(1), 185-204. doi:10.1007/s10586-018-2845-0Prades, J., Varghese, B., Reaño, C., & Silla, F. (2017). Multi-tenant virtual GPUs for optimising performance of a financial risk application. Journal of Parallel and Distributed Computing, 108, 28-44. doi:10.1016/j.jpdc.2016.06.002N. Rajovic, et al. The Mont-Blanc prototype: an alternative approach for HPC systems, in: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, pp. 444–455.C. Reaño, F. Silla, A performance comparison of CUDA remote GPU virtualization frameworks, in: 2015 IEEE International Conference on Cluster Computing, 2015.C. Reaño, F. Silla, Extending rCUDA with support for P2P memory copies between remote GPUs, in: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems, HPCC/SmartCity/DSS, 2016.C. Reaño, F. Silla, J. Duato, Enhancing the rCUDA remote GPU virtualization framework: From a prototype to a production solution, in: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid ’17, 2017.C. Reaño, F. Silla, G. Shainer, S. Schultz, Local and remote GPUs perform similar with EDR 100G InfiniBand, in: Proceedings of the Industrial Track of the 16th International Middleware Conference, Middleware Industry ’15, 2015.A. Selinger, K. Rupp, S. Selberherr, Evaluation of mobile ARM-based SoCs for high performance computing, in: Proceedings of the 24th High Performance Computing Symposium, HPC ’16, 2016, pp. 21:1–21:7.L. Shi, H. Chen, J. Sun, vCUDA: GPU accelerated high performance computing in virtual machines, in: Proc. of the IEEE Parallel and Distributed Processing Symposium, IPDPS, 2009, pp. 1–11.F. Silla, J. Prades, S. Iserte, C. Reaño, Remote GPU virtualization: is it useful?, in: 2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, HiPINEB, 2016.F. Silla, J. Prades, C. Reaño, Leveraging rCUDA for enhancing low-power deployments in the physics domain, in: Proceedings of the 47th International Conference on Parallel Processing Companion, ICPP ’18, 2018

Similar works

Full text

Available Versions

RiuNet

oai:riunet.upv.es:10251/161398

Last time updated on 08/04/2021