1,308 research outputs found

    A Runtime Framework for Energy Efficient HPC Systems Without a Priori Knowledge of Applications

    Get PDF
    International audienceThe rising computing demands of scientific endeavours often require the creation and management of High Performance Computing (HPC) systems for running experiments and processing vast amounts of data. These HPC systems generally operate at peak performance, consuming a large quantity of electricity, even though their workload varies over time. Understanding the behavioural patterns i.e., phases) of HPC systems during their use is key to adjust performance to resource demand and hence improve the energy efficiency. In this paper, we describe (i) a method to detect phases of an HPC system based on its workload, and (ii) a partial phase recognition technique that works cooperatively with on-the-fly dynamic management. We implement a prototype that guides the use of energy saving capabilities to demonstrate the benefits of our approach. Experimental results reveal the effectiveness of the phase detection method under real-life workload and benchmarks. A comparison with baseline unmanaged execution shows that the partial phase recognition technique saves up to 15% of energy with less than 1% performance degradation

    Hierarchical Parallelisation of Functional Renormalisation Group Calculations -- hp-fRG

    Get PDF
    The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes

    An Efficient Monte Carlo-based Probabilistic Time-Dependent Routing Calculation Targeting a Server-Side Car Navigation System

    Full text link
    Incorporating speed probability distribution to the computation of the route planning in car navigation systems guarantees more accurate and precise responses. In this paper, we propose a novel approach for dynamically selecting the number of samples used for the Monte Carlo simulation to solve the Probabilistic Time-Dependent Routing (PTDR) problem, thus improving the computation efficiency. The proposed method is used to determine in a proactive manner the number of simulations to be done to extract the travel-time estimation for each specific request while respecting an error threshold as output quality level. The methodology requires a reduced effort on the application development side. We adopted an aspect-oriented programming language (LARA) together with a flexible dynamic autotuning library (mARGOt) respectively to instrument the code and to take tuning decisions on the number of samples improving the execution efficiency. Experimental results demonstrate that the proposed adaptive approach saves a large fraction of simulations (between 36% and 81%) with respect to a static approach while considering different traffic situations, paths and error requirements. Given the negligible runtime overhead of the proposed approach, it results in an execution-time speedup between 1.5x and 5.1x. This speedup is reflected at infrastructure-level in terms of a reduction of around 36% of the computing resources needed to support the whole navigation pipeline

    Energy efficiency in HPC with and without knowledge of applications and services

    Get PDF
    International audienceThe constant demand of raw performance in high performance computing often leads to high performance systems' over-provisioning which in turn can result in a colossal energy waste due to workload/application variation over time. Proposing energy efficient solutions in the context of large scale HPC is a real unavoidable challenge. This paper explores two alternative approaches (with or without knowledge of applications and services) dealing with the same goal: reducing the energy usage of large scale infrastructures which support HPC applications. This article describes the first approach "with knowledge of applications and services'' which enables users to choose the less consuming implementation of services. Based on the energy consumption estimation of the different implementations (protocols) for each service, this approach is validated on the case of fault tolerance service in HPC. The approach "without knowledge'' allows some intelligent framework to observe the life of HPC systems and proposes some energy reduction schemes. This framework automatically estimates the energy consumption of the HPC system in order to apply power saving schemes. Both approaches are experimentally evaluated and analyzed in terms of energy efficiency

    Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes

    Get PDF
    This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. M. MoretĂł has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243). This work was also partially performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-689878). Finally, the authors are grateful to the reviewers for their valuable comments, to the RoMoL team, to Xavier Teruel and Kallia Chronaki from the Programming Models group of BSC and the Computation Department of LLNL for their technical support and useful feedback.Peer ReviewedPostprint (published version
    • …
    corecore