Search CORE

29 research outputs found

NASA Center for Climate Simulation (NCCS) Advanced Technology AT5 Virtualized Infiniband Report

Author: Bledsoe Benjamin C.
Fromkin Russ
Shakshober John
Thompson John H.
Wagner Mark
Publication venue
Publication date
Field of study

The NCCS is part of the Computational and Information Sciences and Technology Office (CISTO) of Goddard Space Flight Center's (GSFC) Sciences and Exploration Directorate. The NCCS's mission is to enable scientists to increase their understanding of the Earth, the solar system, and the universe by supplying state-of-the-art high performance computing (HPC) solutions. To accomplish this mission, the NCCS (https://www.nccs.nasa.gov) provides high performance compute engines, mass storage, and network solutions to meet the specialized needs of the Earth and space science user communitie

NASA Technical Reports Server

Virtual InfiniBand Clusters for HPC Clouds

Author: Bellosa Frank
Hillenbrand Marius
Mauch Viktor
Miller Konrad
Stoess Jan
Publication venue
Publication date: 26/06/2015
Field of study

High Performance Computing (HPC) employs fast interconnect technologies to provide low communication and synchronization latencies for tightly coupled parallel compute jobs. Contemporary HPC clusters have a xed capacity and static runtime environments; they cannot elastically adapt to dynamic workloads, and provide a limited selection of applications, libraries, and system software. In contrast, a cloud model for HPC clusters promises more exibility, as it provides elastic virtual clusters to be available on-demand. This is not possible with physically owned clusters. In this paper, we present an approach that makes it possible to use InfiniBand clusters for HPC cloud computing. We propose a performance-driven design of an HPC IaaS layer for In niBand, which provides throughput and latency-aware virtualization of nodes, networks, and network topologies, as well as an approach to an HPC-aware, multi-tenant cloud management system for elastic virtualized HPC compute clusters

KITopen

RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing

Author: Calotoiu Alexandru
Copik Marcin
Hoefler Torsten
Taranov Konstantin
Publication venue
Publication date: 25/06/2021
Field of study

The rigid MPI programming model and batch scheduling dominate high-performance computing. While clouds brought new levels of elasticity into the world of computing, supercomputers still suffer from low resource utilization rates. To enhance supercomputing clusters with the benefits of serverless computing, a modern cloud programming paradigm for pay-as-you-go execution of stateless functions, we present rFaaS, the first RDMA-aware Function-as-a-Service (FaaS) platform. With hot invocations and decentralized function placement, we overcome the major performance limitations of FaaS systems and provide low-latency remote invocations in multi-tenant environments. We evaluate the new serverless system through a series of microbenchmarks and show that remote functions execute with negligible performance overheads. We demonstrate how serverless computing can bring elastic resource management into MPI-based high-performance applications. Overall, our results show that MPI applications can benefit from modern cloud programming paradigms to guarantee high performance at lower resource costs

arXiv.org e-Print Archive

Evaluation of the network performance in a high performance computing cloud

Author: Hämäläinen Harri
Publication venue
Publication date: 16/06/2014
Field of study

Pilvipalvelut mahdollistavat resurssien joustavan käytön. Erityisesti niin sanoituissa Infrastructure-as-a-Service -pilvipalveluissa käyttäjän voivat virtualisoinnin kautta ajaa sovelluksiaan omissa virtuaalikoneissaan ja siten muokata sovellusten ajoympäristöä omien tarpeidensa mukaan. Näissä palveluissa käytettävä virtualisointi lisää yleisrasitetta, joka heikentää sekä laskennan että I/O-laitteiden suorituskykyä. Tässä työssä evaluoidaan tällaisen pilvipalvelun verkon suorituskykyä. Palvelussa käytetty verkkoteknologia pohjautuu InﬁniBand-arkkitehtuuriin, joka on yleinen teknologia erityisesti suurteholaskennassa käytettävissä klusterijärjestelmissä. Evaluointimenetelmät tutkivat verkon latenssia ja läpisyöttöä (engl. throughput) eri skenaarioissa, joissa suureita tutkitaan sekä ilman virtualisointia että virtualisoinnin kanssa. Skenaarioiden tarkoituksena on kartoittaa yleisrasitteeseen voimakkaimmin vaikuttavia tekijöitä. Tämän lisäksi työssä evaluoidaan erityistä SR-IOV-teknologiaa, joka mahdollistaa fyysisen laitteen esittämisen joukkona virtuaalikoneisiin liitettäviä virtuaalilaitteita. Teknologian avulla voidaan yleisesti tehostaa I/O laitteiden suorituskykyä virtuaalikoneissa. Tämän evaluoinnin yhteydessä käytettävissä InﬁniBand-laitteissa on SR-IOV-tuesta ollut kehitysversio, jota on testettu evaluoitavassa järjestelmässä. Evaluoinnin tulokset osoittavat käytettävän tunnelointiprotokollan sekä virtualisoinnin I/O-tuen puutteen aiheuttavan suurimmat suorituskyvyn menetykset evaluoiduissa skenaarioissa. Evaluoitu SR-IOV-teknologia on tulosten perusteella kaikissa tapauksissa suositeltava käyttöönotettava teknologia suorituskyvyn parantamiseksi.The cloud services enable a flexible use of resources. Especially in so called Infrasturcture-as-a-Service style cloud services the users can run their own applications in their own virtual machines and so customize the whole execution environment as needed. However the virtualization introduces an overhead which decreases the performance of computation and I/O-device access. This work contains a network performance evaluation of this kind of cloud service. The service uses InfiniBand as its network interconnect solution, a technology often used in high performance computing clusters. The evaluation methods study the network latency and throughput in different scenarios. In these scenarios the metrics are studied with and without virtualization. The purpose of these scenarios is to study the major contributing sources for the introduced overhead. This work also contains an evaluation of SR-IOV technology, which enables the mapping from physical device into multiple virtual functions which can be assigned directly to virtual machines. The technology can be used to improve the performance of I/O devices. In this work the SR-IOV technology is studied with InfiniBand devices which are currently having an experimental support for SR-IOV. The evaluation results show that the tunneling protocol used and the lack of hardware support for virtualized I/O are causing the biggest performance losses in the evaluated scenarios. The evaluated SR-IOV technology is, based on the evaluated scenarios, desired in all cases to improve the performance

Aaltodoc Publication Archive

Performance characterization of containerization for HPC workloads on InfiniBand clusters: an empirical study

Author: Guitart Fernández Jordi
Liu Peini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2022
Field of study

Containerization technology offers an appealing alternative for encapsulating and operating applications (and all their dependencies) without being constrained by the performance penalties of using Virtual Machines and, as a result, has got the interest of the High-Performance Computing (HPC) community to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads. Previous work on this area has demonstrated that containerized HPC applications can exploit InfiniBand networks, but has ignored the potential of multi-container deployments which partition the processes that belong to each application into multiple containers in each host. Partitioning HPC applications has demonstrated to be useful when using virtual machines by constraining them to a single NUMA (Non-Uniform Memory Access) domain. This paper conducts a systematical study on the performance of multi-container deployments with different network fabrics and protocols, focusing especially on Infiniband networks. We analyze the impact of container granularity and its potential to exploit processor and memory affinity to improve applications’ performance. Our results show that default Singularity can achieve near bare-metal performance but does not support fine-grain multi-container deployments. Docker and Singularity-instance have similar behavior in terms of the performance of deployment schemes with different container granularity and affinity. This behavior differs for the several network fabrics and protocols, and depends as well on the application communication patterns and the message size. Moreover, deployments on Infiniband are also more impacted by the computation and memory allocation, and because of that, they can exploit the affinity better.We thank Lenovo for providing the testbed to run the experiments in this paper. This work was partially supported by Lenovo as part of Lenovo-BSC collaboration agreement, by the Spanish Government under contract PID2019-107255GB-C22, and by the Generalitat de Catalunya under contract 2017-SGR-1414 and under Grant No. 2020 FI-B 00257.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A SOFTWARE DEFINED NETWORKING ARCHITECTURE FOR HIGH PERFORMANCE CLOUDS 1

Author: AND John J Prevost
Gilad Berman
M O Jamshidi
Paul Rad
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT-Multi-tenant clouds with resource virtualization offer elasticity of resources and elimination of initial cluster setup cost and time for applications. However, poor network performance, performance variation and noisy neighbors are some of the challenges for execution of high performance applications on public clouds. Utilizing these virtualized resources for scientific applications, which have complex communication patterns, require low latency communication mechanisms and a rich set of communication constructs. To minimize the virtualization overhead, a novel approach for low latency networking for HPC Clouds is proposed and implemented over a multi-technology software defined network. The efficiency of the proposed low-latency SDN is analyzed and evaluated for high performance applications. The results of the experiments show that the latest Mellanox FDR InfiniBand interconnect and Mellanox OpenStack plugin gives the best performance for implementing virtual machine based high performance clouds with large message sizes

CiteSeerX

Cloud-efficient modelling and simulation of magnetic nano materials

Author: Ivanović Pavle
Publication venue
Publication date: 01/01/2021
Field of study

Scientific simulations are rarely attempted in a cloud due to the substantial performance costs of virtualization. Considerable communication overheads, intolerable latencies, and inefficient hardware emulation are the main reasons why this emerging technology has not been fully exploited. On the other hand, the progress of computing infrastructure nowadays is strongly dependent on perspective storage medium development, where efficient micromagnetic simulations play a vital role in future memory design. This thesis addresses both these topics by merging micromagnetic simulations with the latest OpenStack cloud implementation while providing a time and costeffective alternative to expensive computing centers. However, many challenges have to be addressed before a high-performance cloud platform emerges as a solution for problems in micromagnetic research communities. First, the best solver candidate has to be selected and further improved, particularly in the parallelization and process communication domain. Second, a 3-level cloud communication hierarchy needs to be recognized and each segment adequately addressed. The required steps include breaking the VMisolation for the host’s shared memory activation, cloud network-stack tuning, optimization, and efficient communication hardware integration. The project work concludes with practical measurements and confirmation of successfully implemented simulation into an open-source cloud environment. It is achieved that the renewed Magpar solver runs for the first time in the OpenStack cloud by using ivshmem for shared memory communication. Also, extensive measurements proved the effectiveness of our solutions, yielding from sixty percent to over ten times better results than those achieved in the standard cloud.Aufgrund der erheblichen Leistungskosten der Virtualisierung werden wissenschaftliche Simulationen in einer Cloud selten versucht. Beträchtlicher Kommunikationsaufwand, erhebliche Latenzen und ineffiziente Hardwareemulation sind die Hauptgründe, warum diese aufkommende Technologie nicht vollständig genutzt wurde. Andererseits hängt der Fortschritt der Computertechnologie heutzutage stark von der Entwicklung perspektivischer Speichermedien ab, bei denen effiziente mikromagnetische Simulationen eine wichtige Rolle für die zukünftige Speichertechnologie spielen. Diese Arbeit befasst sich mit diesen beiden Themen, indem mikromagnetische Simulationen mit der neuesten OpenStack Cloud-Implementierung zusammengeführt werden, um eine zeit- und kostengünstige Alternative zu teuren Rechenzentren bereitzustellen. Viele Herausforderungen müssen jedoch angegangen werden, bevor eine leistungsstarke Cloud-Plattform als Lösung für Probleme in mikromagnetischen Forschungsgemeinschaften entsteht. Zunächst muss der beste Kandidat für die Lösung ausgewählt und weiter verbessert werden, insbesondere im Bereich der Parallelisierung und Prozesskommunikation. Zweitens muss eine 3-stufige CloudKommunikationshierarchie erkannt und jedes Segment angemessen adressiert werden. Die erforderlichen Schritte umfassen das Aufheben der VM-Isolation, um den gemeinsam genutzten Speicher zwischen Cloud-Instanzen zu aktivieren, die Optimierung des Cloud-Netzwerkstapels und die effiziente Integration von Kommunikationshardware. Die praktische Arbeit endet mit Messungen und der Bestätigung einer erfolgreich implementierten Simulation in einer Open-Source Cloud-Umgebung. Als Ergebnis haben wir erreicht, dass der neu erstellte Magpar-Solver zum ersten Mal in der OpenStack Cloud ausgeführt wird, indem ivshmem für die Shared-Memory Kommunikation verwendet wird. Umfangreiche Messungen haben auch die Wirksamkeit unserer Lösungen bewiesen und von sechzig Prozent bis zu zehnmal besseren Ergebnissen als in der Standard Cloud geführt

Publikationsserver der Technischen Universität Clausthal

Impact of network interconnection in cloud computing environments for high-performance computing applications

Author: Maliszewski Anderson Mattheus
Publication venue
Publication date: 01/01/2021
Field of study

The availability of computational resources has changed significantly due to the use of the cloud computing paradigm. Aiming at potential advantages, such as cost savings through the pay-per-use method and scalable/elastic resource allocation, we have witnessed ef forts to execute high-performance computing (HPC) applications in the cloud. Due to the distributed nature of these environments, performance is highly dependent on two primary components of the system: processing power and network interconnection. If allocating more powerful hardware theoretically increases performance, it increases the allocation cost on the other hand. Allocation exclusivity guarantees space for memory, storage, and CPU. This is not the case for the network interconnection since several si multaneous instances (multi-tenants) share the same communication channel, making the network a bottleneck. Therefore, this dissertation aims to analyze the impact of network interconnection on the execution of workloads from the HPC domain. We carried out two different assessments. The first concentrates on different network interconnections (GbE and InfiniBand) in the Microsoft Azure public cloud and costs related to their use. The second focuses on different network configurations using NIC aggregation methodolo gies in a private cloud-controlled environment. The results obtained showed that network interconnection is a crucial aspect and can significantly impact the performance of HPC applications executed in the cloud. In the Azure public cloud, the accelerated networking approach, which allows the instance to have a high-performance interconnection without additional charges, allows significant performance improvements for HPC applications with better cost efficiency. Finally, in the private cloud environment, the NIC aggre gation approach outperformed the baseline up to ≈98% of the executions with applica tions that make intensive use of the network. Also, Balance Round-Robin aggregation mode performed better than 802.3ad aggregation mode in the majority of the executions.A disponibilidade de recursos computacionais mudou significativamente devido ao uso do paradigma de computação em nuvem. Visando vantagens potenciais, como economia de custos por meio do método de pagamento por uso e alocação de recursos escalável/e lástica, testemunhamos esforços para executar aplicações de computação de alto desem penho (HPC) na nuvem. Devido à natureza distribuída desses ambientes, o desempenho é altamente dependente de dois componentes principais do sistema: potência de processa mento e interconexão de rede. Se a alocação de um hardware mais poderoso teoricamente aumenta o desempenho, ele aumenta o custo de alocação, por outro lado. A exclusividade de alocação garante espaço para memória, armazenamento e CPU. Este não é o caso da interconexão de rede, pois várias instâncias simultâneas (multilocatários) compartilham o mesmo canal de comunicação, tornando a rede um gargalo. Portanto, esta dissertação tem como objetivo analisar o impacto da interconexão de redes na execução de cargas de tra balho do domínio HPC. Realizamos duas avaliações diferentes. O primeiro concentra-se em diferentes interconexões de rede (GbE e InfiniBand) na nuvem pública da Microsoft Azure e nos custos relacionados ao seu uso. O segundo se concentra em diferentes confi gurações de rede usando metodologias de agregação de NICs em um ambiente controlado por nuvem privada. Os resultados obtidos mostraram que a interconexão de rede é um aspecto crucial e pode impactar significativamente no desempenho das aplicações HPC executados na nuvem. Na nuvem pública do Azure, a abordagem de rede acelerada, que permite que a instância tenha uma interconexão de alto desempenho sem encargos adici onais, permite melhorias significativas de desempenho para aplicações HPC com melhor custo-benefício. Finalmente, no ambiente de nuvem privada, a abordagem de agrega ção NIC superou a linha de base em até 98% das execuções com aplicações que fazem uso intensivo da rede. Além disso, o modo de agregação Balance Round-Robin teve um desempenho melhor do que o modo de agregação 802.3ad na maioria das execuções

Lume 5.8

Process migration in a parallel environment

Author: Reber Adrian
Publication venue: Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart
Publication date: 01/01/2016
Field of study

To satisfy the ever increasing demand for computational resources, high performance computing systems are becoming larger and larger. Unfortunately, the tools supporting system management tasks are only slowly adapting to the increase in components in computational clusters. Virtualization provides concepts which make system management tasks easier to implement by providing more flexibility for system administrators. With the help of virtual machine migration, the point in time for certain system management tasks like hardware or software upgrades no longer depends on the usage of the physical hardware. The flexibility to migrate a running virtual machine without significant interruption to the provided service makes it possible to perform system management tasks at the optimal point in time. In most high performance computing systems, however, virtualization is still not implemented. The reason for avoiding virtualization in high performance computing is that there is still an overhead accessing the CPU and I/O devices. This overhead continually decreases and there are different kind of virtualization techniques like para-virtualization and container-based virtualization which minimize this overhead further. With the CPU being one of the primary resources in high performance computing, this work proposes to migrate processes instead of virtual machines thus avoiding any overhead. Process migration can either be seen as an extension to pre-emptive multitasking over system boundaries or as a special form of checkpointing and restarting. In the scope of this work process migration is based on checkpointing and restarting as it is already an established technique in the field of fault tolerance. From the existing checkpointing and restarting implementations, the best suited implementation for process migration purposes was selected. One of the important requirements of the checkpointing and restarting implementation is transparency. Providing transparent process migration is important enable the migration of any process without prerequisites like re-compilation or running in a specially prepared environment. With process migration based on checkpointing and restarting, the next step towards providing process migration in a high performance computing environment is to support the migration of parallel processes. Using MPI is a common method of parallelizing applications and therefore process migration has to be integrated with an MPI implementation. The previously selected checkpointing and restarting implementation was integrated in an MPI implementation, and thus enabling the migration of parallel processes. With the help of different test cases the implemented process migration was analyzed, especially in regards to the time required to migrated a process and the advantages of optimizations to reduce the process’ downtime during migration.Um die immer steigenden Anforderungen an Rechenressourcen im High Performance Computing zu erfüllen werden die eingesetzten Systeme immer größer. Die Werkzeuge, mit denen Wartungsarbeiten durchgeführt werden, passen sich nur langsam an die wachsende Größe dieser neuen Systeme an. Virtualisierung stellt Konzepte zur Verfügung, welche Systemverwaltungsaufgaben durch höhere Flexibilität vereinfachen. Mit Hilfe der Migration virtueller Maschinen können Systemverwaltungsaufgaben zu einem frei wählbaren Zeitpunkt durchgeführt werden und hängen nicht mehr von der Nutzung der physikalischen Systeme ab. Die auf der virtuellen Maschine ausgeführte Applikation kann somit ohne Unterbrechung weiterlaufen. Trotz der vielen Vorteile wird Virtualisierung in den meisten High Performance Computing Systemen noch nicht eingesetzt, dadurch Rechenzeit verloren geht und höhere Antwortzeiten beim Zugriff auf Hardware auftreten. Obwohl die Effektivität der Virtualisierungsumgebungen steigt, werden Ansätze wie Para-Virtualisierung oder Container-basierte Virtualisierung untersucht bei denen noch weniger Rechenzeit verloren geht. Da die CPU eine der zentralen Ressourcen im High Performance Computing ist wird im Rahmen dieser Arbeit der Ansatz verfolgt anstatt virtueller Maschinen nur einzelne Prozesse zu migrieren und dadurch den Verlust an Rechenzeit zu vermeiden. Prozess Migration kann einerseits als eine Erweiterung des präemptive Multitasking über Systemgrenzen, andererseits auch als eine Sonderform des Checkpointing und Restarting angesehen werden. Im Rahmen dieser Arbeit wird Prozess Migration auf der Basis von Checkpointing und Restarting durchgeführt, da es eine bereits etablierte Technologie im Umfeld der Fehlertoleranz ist. Die am besten für Prozess Migration im Rahmen dieser Arbeit geeignete Checkpointing und Restarting Implementierung wurde ausgewählt. Eines der wichtigsten Kriterien bei der Auswahl der Checkpointing und Restarting Implementierung ist die Transparenz. Nur mit einer möglichst transparenten Implementierung sind die Anforderungen an die zu migrierenden Prozesse gering und keinerlei Einschränkungen wie das Neu-Übersetzen oder eine speziell präparierte Laufzeitumgebung sind nötig. Mit einer auf Checkpointing und Restarting basierenden Prozess Migration ist der nächste Schritt parallele Prozess Migration für den Einsatz im High Performance Computing. MPI ist einer der gängigen Wege eine Applikation zu parallelisieren und deshalb muss Prozess Migration auch in eine MPI Implementation integriert werden. Die vorhergehend ausgewählte Checkpointing und Restarting Implementierung wird in einer MPI Implementierung integriert, um auf diese Weise Migration von parallelen Prozessen zu bieten. Mit Hilfe verschiedener Testfälle wurde die im Rahmen dieser Arbeit entwickelte Prozess Migration analysiert. Schwerpunkte waren dabei die Zeit, die benötigt wird um einen Prozess zu migrieren und wie sich Optimierungen zur Verkürzung der Migrationszeit auswirken