POWAR: Power-Aware Routing in HPC Networks with On/Off Links

Abstract

[EN] In order to save energy in HPC interconnection networks, one usual proposal is to switch idle links into a low-power mode after a certain time without any transmission, as IEEE Energy Efficient Ethernet standard proposes. Extending the low-power mode mechanism, we propose POWer-Aware Routing (POWAR), a simple power-aware routing and selection function for fat-tree and torus networks. POWAR adapts the amount of network links that can be used, taking into account the network load, and obtaining great energy savings in the network (55%-65%) and the entire system (9%-10%) with negligible performance overhead.This work has been supported by the Spanish MINECO and European Commission (FEDER funds) under project TIN2015-66972-C5-1-R. Francisco J. Andujar has been partially funded by the Spanish MICINN and by the ERDF program of the European Union: PCAS Project (TIN2017-88614-R), CAPAP-H6 (TIN2016-81840-REDT), and Junta de Castilla y Leon FEDER Grant VA082P17 (PROPHET Project).Andújar-Muñoz, FJ.; Coll, S.; Alonso Díaz, M.; López Rodríguez, PJ.; Martínez-Rubio, J. (2019). POWAR: Power-Aware Routing in HPC Networks with On/Off Links. ACM Transactions on Architecture and Code Optimization. 15(4):1-22. https://doi.org/10.1145/3293445S122154Abts, D., Marty, M. R., Wells, P. M., Klausler, P., & Liu, H. (2010). Energy proportional datacenter networks. Proceedings of the 37th annual international symposium on Computer architecture - ISCA ’10. doi:10.1145/1815961.1816004Adiga, N. R., Blumrich, M. A., Chen, D., Coteus, P., Gara, A., Giampapa, M. E., … Vranas, P. (2005). Blue Gene/L torus interconnection network. IBM Journal of Research and Development, 49(2.3), 265-276. doi:10.1147/rd.492.0265M. Alonso S. Coll J. M. Martínez V. Santonja and P. López. 2015. Power consumption management in fat-tree interconnection networks. Parallel Comput. 48 C (Oct. 2015) 59--80. 10.1016/j.parco.2015.03.007 M. Alonso S. Coll J. M. Martínez V. Santonja and P. López. 2015. Power consumption management in fat-tree interconnection networks. Parallel Comput. 48 C (Oct. 2015) 59--80. 10.1016/j.parco.2015.03.007Marina Alonso, Coll, S., Martínez, J.-M., Santonja, V., López, P., & Duato, J. (2010). Power saving in regular interconnection networks. Parallel Computing, 36(12), 696-712. doi:10.1016/j.parco.2010.08.003Bob Alverson Edwin Froese Larry Kaplan and Duncan Roweth. 2012. Cray XC series network. Cray Inc. White Paper WP-Aries01-1112 (2012). Bob Alverson Edwin Froese Larry Kaplan and Duncan Roweth. 2012. Cray XC series network. Cray Inc. White Paper WP-Aries01-1112 (2012).Anderson, T. E., Owicki, S. S., Saxe, J. B., & Thacker, C. P. (1993). High-speed switch scheduling for local-area networks. ACM Transactions on Computer Systems, 11(4), 319-352. doi:10.1145/161541.161736Andujar, F. J., Villar, J. A., Sanchez, J. L., Alfaro, F. J., & Escudero-Sahuquillo, J. (2015). VEF Traces: A Framework for Modelling MPI Traffic in Interconnection Network Simulators. 2015 IEEE International Conference on Cluster Computing. doi:10.1109/cluster.2015.141Barroso, L. A., & Hölzle, U. (2007). The Case for Energy-Proportional Computing. Computer, 40(12), 33-37. doi:10.1109/mc.2007.443Camacho, J., & Flich, J. (2011). HPC-Mesh: A Homogeneous Parallel Concentrated Mesh for Fault-Tolerance and Energy Savings. 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems. doi:10.1109/ancs.2011.17Chen, D., Parker, J. J., Eisley, N. A., Heidelberger, P., Senger, R. M., Sugawara, Y., … Steinmacher-Burow, B. (2011). The IBM Blue Gene/Q interconnection network and message unit. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’11. doi:10.1145/2063384.2063419Chen, L., & Pinkston, T. M. (2012). NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers. 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. doi:10.1109/micro.2012.33Christensen, K., Reviriego, P., Nordman, B., Bennett, M., Mostowfi, M., & Maestro, J. (2010). IEEE 802.3az: the road to energy efficient ethernet. IEEE Communications Magazine, 48(11), 50-56. doi:10.1109/mcom.2010.5621967Dally, & Seitz. (1987). Deadlock-Free Message Routing in Multiprocessor Interconnection Networks. IEEE Transactions on Computers, C-36(5), 547-553. doi:10.1109/tc.1987.1676939Das, R., Narayanasamy, S., Satpathy, S. K., & Dreslinski, R. G. (2013). Catnap. Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA ’13. doi:10.1145/2485922.2485950Derradji, S., Palfer-Sollier, T., Panziera, J.-P., Poudes, A., & Atos, F. W. (2015). The BXI Interconnect Architecture. 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. doi:10.1109/hoti.2015.15Jack Dongarra Hans W. Meuer and Erich Strohmaier. 2018. TOP500 Supercomputer Sites. Retrieved from https://www.top500.org. Jack Dongarra Hans W. Meuer and Erich Strohmaier. 2018. TOP500 Supercomputer Sites. Retrieved from https://www.top500.org.Duato, J. (1993). A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Transactions on Parallel and Distributed Systems, 4(12), 1320-1331. doi:10.1109/71.250114José Duato Sudhakar Yalamanchili and Lionel Ni. 2003. Interconnection Networks. An Engineering Approach. Morgan Kaufmann Publishers Inc. San Francisco CA. José Duato Sudhakar Yalamanchili and Lionel Ni. 2003. Interconnection Networks. An Engineering Approach. Morgan Kaufmann Publishers Inc. San Francisco CA.GALGO 2017. GALGO—Albacete Research Institute of Informatics Supercomputer Center homepage. Retrieved from http://www.i3a.uclm.es/galgo. GALGO 2017. GALGO—Albacete Research Institute of Informatics Supercomputer Center homepage. Retrieved from http://www.i3a.uclm.es/galgo.Greenberg, A., Hamilton, J., Maltz, D. A., & Patel, P. (2008). The cost of a cloud. ACM SIGCOMM Computer Communication Review, 39(1), 68-73. doi:10.1145/1496091.1496103HPCC {n.d.}. HPC Challenge Benchmark. Retrieved from http://icl.cs.utk.edu/hpcc/index.html. HPCC {n.d.}. HPC Challenge Benchmark. Retrieved from http://icl.cs.utk.edu/hpcc/index.html.Hluchyj, M. G., & Karol, M. J. (1988). Queueing in high-performance packet switching. IEEE Journal on Selected Areas in Communications, 6(9), 1587-1597. doi:10.1109/49.12886Koibuchi, M., Otsuka, T., Hiroki Matsutani, & Amano, H. (2009). An on/off link activation method for low-power ethernet in PC clusters. 2009 IEEE International Symposium on Parallel & Distributed Processing. doi:10.1109/ipdps.2009.5161069Phillips, J. C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., … Schulten, K. (2005). Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 26(16), 1781-1802. doi:10.1002/jcc.20289Pronk, S., Páll, S., Schulz, R., Larsson, P., Bjelkmar, P., Apostolov, R., … Lindahl, E. (2013). GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics, 29(7), 845-854. doi:10.1093/bioinformatics/btt055Reviriego, P., Hernandez, J., Larrabeiti, D., & Maestro, J. (2009). Performance evaluation of energy efficient ethernet. IEEE Communications Letters, 13(9), 697-699. doi:10.1109/lcomm.2009.090880K. P. Saravanan and P. Carpenter. 2018. PerfBound: Conserving energy with bounded overheads in on/off-based HPC interconnects. IEEE Trans. Comput. (2018) 1--1. 10.1109/TC.2018.2790394 K. P. Saravanan and P. Carpenter. 2018. PerfBound: Conserving energy with bounded overheads in on/off-based HPC interconnects. IEEE Trans. Comput. (2018) 1--1. 10.1109/TC.2018.2790394Saravanan, K. P., Carpenter, P. M., & Ramirez, A. (2013). Power/performance evaluation of energy efficient Ethernet (EEE) for High Performance Computing. 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). doi:10.1109/ispass.2013.6557171Soteriou, V., & Li-Shiuan Peh. (s. f.). Dynamic power management for power optimization of interconnection networks using on/off links. 11th Symposium on High Performance Interconnects, 2003. Proceedings. doi:10.1109/conect.2003.1231472Totoni, E., Jain, N., & Kale, L. V. (2013). Toward Runtime Power Management of Exascale Networks by on/off Control of Links. 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. doi:10.1109/ipdpsw.2013.191VEF 2017. VEF traces homepage. Retrieved from http://www.i3a.info/VEFtraces. VEF 2017. VEF traces homepage. Retrieved from http://www.i3a.info/VEFtraces

    Similar works

    Full text

    thumbnail-image

    Available Versions