10 research outputs found
Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks
In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables.
Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions
Hot-Spot Avoidance With Multi-Pathing Over Infiniband: An MPI Perspective
Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 Supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network depending upon the route configuration between end nodes and communication pattern(s) in the application. To make matters worse, the deterministic routing nature of InfiniBand limits the application from effective use of multiple paths transparently and avoid the hot-spots in the network. Simulation based studies for switches and adapters to implement congestion control have been proposed in the literature. However, these studies have focused on providing congestion control for the communication path, and not on utilizing multiple paths in the network for hot-spot avoidance. In this paper, we design an MPI functionality, which provides hot-spot avoidance for different communications, without a priori knowledge of the pattern. We leverage LMC (LID Mask Count) mechanism of InfiniBand to create multiple paths in the network and present the design issues (scheduling policies, selecting number of paths, scalability aspects) of our design. We implement our design and evaluate it with Pallas collective communication and MPI applications. On an InfiniBand cluster with 48 processes, collective operations like MPI All-to-all Personalized and MPI Reduce Scatter show an improvement of 27% and 19% respectively. Our evaluation with MPI applications like NAS Parallel Benchmarks and PSTSWM on 64 processes shows significant improvement in execution time with this functionality
A Framework for Cyber Vulnerability Assessments of InfiniBand Networks
InfiniBand is a popular Input/Output interconnect technology used in High Performance Computing clusters. It is employed in over a quarter of the world’s 500 fastest computer systems. Although it was created to provide extremely low network latency with a high Quality of Service, the cybersecurity aspects of InfiniBand have yet to be thoroughly investigated. The InfiniBand Architecture was designed as a data center technology, logically separated from the Internet, so defensive mechanisms such as packet encryption were not implemented. Cyber communities do not appear to have taken an interest in InfiniBand, but that is likely to change as attackers branch out from traditional computing devices. This thesis considers the security implications of InfiniBand features and constructs a framework for conducting Cyber Vulnerability Assessments. Several attack primitives are tested and analyzed. Finally, new cyber tools and security devices for InfiniBand are proposed, and changes to existing products are recommended
netloc: Towards a Comprehensive View of the HPC System Topology
International audienceThe increasing complexity of High Performance Computing (HPC) server architectures and networks has made topology- and affinity-awareness a critical component of HPC application optimization. Although there is a portable mechanism for accessing the server-internal topology there is no such mechanism for accessing the network topology of modern HPC systems in an equally portable manner. The Network Locality (netloc) project provides mechanisms for portably discovering and abstractly representing the network topology of modern HPC systems. Additionally, netloc provides the ability to merge the network topology with the server-internal topologies resulting in a comprehensive map of the HPC system topology. Using a modular infrastructure, netloc provides support for a variety of network types and discovery techniques. By representing the network topology as a graph, netloc supports any network topology configuration. The netloc architecture hides the topology discovery mechanism from the application developer thus allowing them to focus on traversing and analyzing the resulting map of the HPC system topology
Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters
Actualmente, los clusters de PC son un alternativa rentable a los computadores paralelos.
En estos sistemas, miles de componentes (procesadores y/o discos duros) se conectan a través de redes de interconexión de altas prestaciones.
Entre las tecnologÃas de red actualmente disponibles para construir clusters, InfiniBand (IBA) ha emergido como un nuevo estándar de interconexión para clusters.
De hecho, ha sido adoptado por muchos de los sistemas más potentes construidos actualmente (lista top500).
A medida que el número de nodos aumenta en estos sistemas, la red de interconexión también crece.
Junto con el aumento del número de componentes la probabilidad de averÃas aumenta dramáticamente, y asÃ, la tolerancia a fallos en el sistema en general, y de la red de interconexión en particular, se convierte en una necesidad.
Desafortunadamente, la mayor parte de las estrategias de encaminamiento tolerantes a fallos propuestas para los computadores masivamente paralelos no pueden ser aplicadas porque el encaminamiento y las transiciones de canal virtual son deterministas en IBA, lo que impide que los paquetes eviten los fallos.
Por lo tanto, son necesarias nuevas estrategias para tolerar fallos.
Por ello, esta tesis se centra en proporcionar los niveles adecuados de tolerancia a fallos a los clusters de PC, y en particular a las redes IBA.
En esta tesis proponemos y evaluamos varios mecanismos adecuados para las redes de interconexión para clusters.
El primer mecanismo para proporcionar tolerancia a fallos en IBA (al que nos referimos como encaminamiento tolerante a fallos basado en transiciones; TFTR) consiste en usar varias rutas disjuntas entre cada par de nodos origen-destino y seleccionar la ruta apropiada en el nodo fuente usando el mecanismo APM proporcionado por IBA.
Consiste en migrar las rutas afectadas por el fallo a las rutas alternativas sin fallos.
Sin embargo, con este fin, es necesario un algoritmo eficiente de encaminamiento capaz de proporcionar suficientesMontañana Aliaga, JM. (2008). Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2603Palanci
Recommended from our members
Performance modelling and evaluation of heterogeneous wired / wireless networks under Bursty Traffic. Analytical models for performance analysis of communication networks in multi-computer systems, multi-cluster systems, and integrated wireless systems.
Computer networks can be classified into two broad categories: wired networks and
wireless networks, according to the hardware and software technologies used to
interconnect the individual devices. Wired interconnection networks are hardware
fabrics supporting communications between individual processors in highperformance
computing systems (e.g., multi-computer systems and cluster systems).
On the other hand, due to the rapid development of wireless technologies, wireless
networks have emerged and become an indispensable part for people¿s lives. The
integration of different wireless technologies is an effective approach to
accommodate the increasing demand of the users to communicate with each other
and access the Internet.
This thesis aims to investigate the performance of wired interconnection
networks and integrated wireless networks under the realistic working conditions.
Traffic patterns have a significant impact on network performance. A number of
recent measurement studies have convincingly demonstrated that the traffic
generated by many real-world applications in communication networks exhibits
bursty arrival nature and the message destinations are non-uniformly distributed.
Analytical models for the performance evaluation of wired interconnection networks
and integrated wireless networks have been widely reported. However, most of these
models are developed under the simplified assumption of non-bursty Poisson process
with uniformly distributed message destinations.
To fill this gap, this thesis first presents an analytical model to investigate the
performance of wired interconnection networks in multi-computer systems. Secondly,
the analytical models for wired interconnection networks in multi-cluster systems are
developed. Finally, this thesis proposes analytical models to evaluate the end-to-end
delay and throughput of integrated wireless local area networks and wireless mesh
networks. These models are derived when the networks are subject to bursty traffic
with non-uniformly distributed message destinations which can capture the
burstiness of real-world network traffic in the both temporal domain and spatial
domain. Extensive simulation experiments are conducted to validate the accuracy of
the analytical models. The models are then used as practical and cost-effective tools
to investigate the performance of heterogeneous wired or wireless networks under
the traffic patterns exhibited by real-world applications
Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM. InWorkshoponSystemManagement ToolsonLargeScaleParallelSystems
InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. Fat Tree is a primary interconnection topology for building large scale InfiniBand clusters. Instead of using a shared bus approach, InfiniBand employs an arbitrary switched point-to-point topology. In order to manage the subnet, InfiniBand specifies a basic management infrastructure responsible for discovery, configuration and maintaining the active state of the network. In the literature, simulation studies have been done on irregular topologies to characterize the subnet management mechanism. However, there is no study to model subnet management mechanism on regular topologies using actual implementations. In this paper, we take up the challenge of modeling subnet management mechanism for Fat Tree InfiniBand networks using a popular subnet manager OpenSM. We present the timings for various subnet management phases namely topology discovery, path computation and path distribution for large scale fat tree InfiniBand subnets and present basic performance evaluation on small scale Infini-Band cluster. We verify our model with the basic set of results obtained, and present the results for the model by varying different parameters on Fat Trees.