951 research outputs found
Direct -body code on low-power embedded ARM GPUs
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct -body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the
Computing Conference 2019 proceeding
Towards Loosely-Coupled Programming on Petascale Systems
We have extended the Falkon lightweight task execution framework to make
loosely coupled programming on petascale systems a practical and useful
programming model. This work studies and measures the performance factors
involved in applying this approach to enable the use of petascale systems by a
broader user community, and with greater ease. Our work enables the execution
of highly parallel computations composed of loosely coupled serial jobs with no
modifications to the respective applications. This approach allows a new-and
potentially far larger-class of applications to leverage petascale systems,
such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O
performance encountered in making this model practical, and show results using
both microbenchmarks and real applications from two domains: economic energy
modeling and molecular dynamics. Our benchmarks show that we can scale up to
160K processor-cores with high efficiency, and can achieve sustained execution
rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing,
Networking, Storage and Analysis (SuperComputing/SC) 200
PerfBound: Conserving Energy with Bounded Overheads in On/Off-Based HPC Interconnects
Energy and power are key challenges in high-performance computing. System energy efficiency must be significantly improved, and this requires greater efficiency in all subcomponents. An important target of optimization is the interconnect, since network links are always on, consuming power even during idle periods. A large number of HPC machines have a primary interconnect based on Ethernet (about 40 percent of TOP500 machines), which, since 2010, has included support for saving power via Energy Efficient Ethernet (EEE). Nevertheless, it is unlikely that HPC interconnects would use these energy saving modes unless the performance overhead is known and small. This paper presents PerfBound, a self-contained technique to manage on/off-based networks such as EEE, minimizing interconnect link energy consumption subject to a bound on the performance degradation. PerfBound does not require changes to the applications and it uses only local information already available at switches and NICs without introducing additional communication messages, and is also compatible with multi-hop networks. PerfBound is evaluated using traces from a production supercomputer. For twelve out of fourteen applications, PerfBound has high energy savings, up to 70 percent for only 1 percent performance degradation. This paper also presents DynamicFastwake, which extends PerfBound to exploit multiple low-power states. DynamicFastwake achieves an energy-delay product 10 percent lower than the original PerfBound techniqueThis research was supported by European Union’s 7th Framework Programme [FP7/2007-2013] under the Mont-Blanc-3 (FP7-ICT-671697) and EUROSERVER (FP7-ICT-610456) projects, the Ministry of Economy and Competitiveness of Spain (TIN2012-34557 and TIN2015-65316), Generalitat de Catalunya (FI-AGAUR 2012 FI B 00644, 2014-SGR-1051 and 2014-SGR-1272), the European Union’s Horizon2020 research and innovation programme under the HiPEAC-3 Network of Excellence (ICT-287759), and the
Severo Ochoa Program (SEV-2011-00067) of the Spanish Government.Peer ReviewedPostprint (author's final draft
Iso-energy-efficiency: An approach to power-constrained parallel computation
Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making
Analysis and evaluation of MapReduce solutions on an HPC cluster
This is a post-peer-review, pre-copyedit version of an article published in Computers & Electrical Engineering. The final authenticated version is available online at: https://doi.org/10.1016/j.compeleceng.2015.11.021[Abstract] The ever growing needs of Big Data applications are demanding challenging capabilities which cannot be handled easily by traditional systems, and thus more and more organizations are adopting High Performance Computing (HPC) to improve scalability and efficiency. Moreover, Big Data frameworks like Hadoop need to be adapted to leverage the available resources in HPC environments. This situation has caused the emergence of several HPC-oriented MapReduce frameworks, which benefit from different technologies traditionally oriented to supercomputing, such as high-performance interconnects or the message-passing interface. This work aims to establish a taxonomy of these frameworks together with a thorough evaluation, which has been carried out in terms of performance and energy efficiency metrics. Furthermore, the adaptability to emerging disks technologies, such as solid state drives, has been assessed. The results have shown that new frameworks like DataMPI can outperform Hadoop, although using IP over InfiniBand also provides significant benefits without code modifications.Ministerio de EconomĂa y Competitividad; TIN2013-42148-
- …