595 research outputs found

    Enabling fair pricing on HPC systems with node sharing

    Full text link
    Abstract not provide

    Memory demands in disaggregated HPC: How accurate do we need to be?

    Get PDF
    Disaggregated memory has recently been proposed as a way to allow flexible and fine-grained allocation of memory capacity, mitigating the mismatch between fixed per-node resource provisioning and the needs of the submitted jobs. By allowing the sharing of memory capacity among cluster nodes, overall HPC system throughput can be improved, due to the reduction of stranded and underutilized resources. A key parameter that is generally expected to be provided by the user at submission time is the job's memory capacity demand. It is unrealistic to expect this number to be precise. This paper makes an important step towards understanding the effect of overestimating the job memory requirements. We analyse the implications on overall system throughput and job response time. We leverage a disaggregated simulation infrastructure implemented on the popular Slurm resource manager. Our results show that even when the cost of a 60% increase in memory demands only increases a single job's user response time by 8%, the aggregate result of everybody doing so can be a 25% reduction in throughput and a 5 times increase in response time. These results show that GB-hours should be explicitly allocated in addition to core-hours.This work is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA); it has been supported by the Spanish Ministry of Science and Innovation (project TIN2015-65316-P and Ramon y Cajal fellowship RYC2018-025628-I), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), and the Severo Ochoa Programme (SEV-2015-0493).Peer ReviewedPostprint (author's final draft

    Supercomputing futures : the next sharing paradigm for HPC resources : economic model, market analysis and consequences for the Grid

    Get PDF
    À la croisée des chemins du génie informatique, de la finance et de l'économétrie, cette thèse se veut fondamentalement un exercice en ingénierie économique dont l' objectif est de contribuer un système novateur, durable et adaptatif pour le partage de resources de calcul haute-performance. Empruntant à la finance fondamentale et à l'analyse technique, le modèle proposé construit des ratios et des indices de marché à partir de statistiques transactionnelles. Cette approche, encourageant les comportements stratégiques, pave la voie à une métaphore de partage plus efficace pour la Grid, où l'échange de ressources se voit maintenant pondéré. Le concept de monnaie de Grid, un instrument beaucoup plus liquide et utilisable que le troc de resources comme telles est proposé: les Grid Credits. Bien que les indices proposés ne doivent pas être considérés comme des indicateurs absolus et contraignants, ils permettent néanmoins aux négociants de se faire une idée de la valeur au marché des différentes resources avant de se positionner. Semblable sur de multiples facettes aux bourses de commodités, le Grid Exchange, tel que présenté, permet l'échange de resources via un mécanisme de double-encan. Néanmoins, comme les resources de super-calculateurs n'ont rien de standardisé, la plate-forme permet l'échange d'ensemble de commodités, appelés requirement sets, pour les clients, et component sets, pour les fournisseurs. Formellement, ce modèle économique n'est qu'une autre instance de la théorie des jeux non-coopératifs, qui atteint éventuellement ses points d'équilibre. Suivant les règles du "libre-marché", les utilisateurs sont encouragés à spéculer, achetant, ou vendant, à leur bon vouloir, l'utilisation des différentes composantes de superordinateurs. En fin de compte, ce nouveau paradigme de partage de resources pour la Grid dresse la table à une nouvelle économie et une foule de possibilités. Investissement et positionnement stratégique, courtiers, spéculateurs et même la couverture de risque technologique sont autant d'avenues qui s'ouvrent à l'horizon de la recherche dans le domaine

    RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing

    Full text link
    The rigid MPI programming model and batch scheduling dominate high-performance computing. While clouds brought new levels of elasticity into the world of computing, supercomputers still suffer from low resource utilization rates. To enhance supercomputing clusters with the benefits of serverless computing, a modern cloud programming paradigm for pay-as-you-go execution of stateless functions, we present rFaaS, the first RDMA-aware Function-as-a-Service (FaaS) platform. With hot invocations and decentralized function placement, we overcome the major performance limitations of FaaS systems and provide low-latency remote invocations in multi-tenant environments. We evaluate the new serverless system through a series of microbenchmarks and show that remote functions execute with negligible performance overheads. We demonstrate how serverless computing can bring elastic resource management into MPI-based high-performance applications. Overall, our results show that MPI applications can benefit from modern cloud programming paradigms to guarantee high performance at lower resource costs

    Energy Demand Response for High-Performance Computing Systems

    Get PDF
    The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply. This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system\u27s demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems\u27 energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system\u27s emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system\u27s energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response
    • …
    corecore