910 research outputs found

    High-Quality Fault-Resiliency in Fat-Tree Networks (Extended Abstract)

    Full text link
    Coupling regular topologies with optimized routing algorithms is key in pushing the performance of interconnection networks of HPC systems. In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalized Fat-Trees (PGFTs) which minimizes congestion risk even under massive topology degradation caused by equipment failure. It applies a modulo-based computation of forwarding tables among switches closer to the destination, using only knowledge of subtrees for pre-modulo division. Dmodc allows complete rerouting of topologies with tens of thousands of nodes in less than a second, which greatly helps centralized fabric management react to faults with high-quality routing tables and no impact to running applications in current and future very large-scale HPC clusters. We compare Dmodc against routing algorithms available in the InfiniBand control software (OpenSM) first for routing execution time to show feasibility at scale, and then for congestion risk under degradation to demonstrate robustness. The latter comparison is done using static analysis of routing tables under random permutation (RP), shift permutation (SP) and all-to-all (A2A) traffic patterns. Results for Dmodc show A2A and RP congestion risks similar under heavy degradation as the most stable algorithms compared, and near-optimal SP congestion risk up to 1% of random degradation

    Dynamic routing balancing on InfiniBand network

    Get PDF
    InfiniBand (IBA) technology was developed to address the performance issues associated with messages movement among Endnodes and computer I/O devices. However, InfiniBand is also widely deployed within high performance computing (HPC) clusters due to the high bandwidth and low message latency attributes it offers to inter-processor communication systems. An interconnection-network efficient design is mandatory because its great impact on the parallel computer performance. Therefore, a high speed routing scheme that minimizes congestion and avoids hot-spot areas should be included in the network components. We have developed Dynamic Routing Balancing (DRB), an adaptive routing mechanism that balances the communication traffic over the interconnection network. It is based on limited and load-controlled multipath expansion in order to maintain low and bounded network latency. In this work, we propose using DRB as the congestion control mechanism for InfiniBand networks. Experimentation shows that our method achieves significant performance improvement over the original InfiniBand technique which is based on message throttling. An improvement up to 66% for latency and 35% for throughput is achieved for the networks under analysis. Finally, the proposed mechanism use the management model defined in InfiniBand specs, thus full compatibility is provided.Facultad de Informátic

    Capturing the impact of external interference on HPC application performance

    Get PDF
    HPC applications are large software packages with high computation and storage requirements. To meet these requirements, the architectures of supercomputers are continuously evolving and their capabilities are continuously increasing. Present-day supercomputers have achieved petaflops of computational power by utilizing thousands to millions of compute cores, connected through specialized communication networks, and are equipped with petabytes of storage using a centralized I/O subsystem. While fulfilling the high resource demands of HPC applications, such a design also entails its own challenges. Applications running on these systems own the computation resources exclusively, but share the communication interconnect and the I/O subsystem with other concurrently running applications. Simultaneous access to these shared resources causes contention and inter-application interference, leading to degraded application performance. Inter-application interference is one of the sources of run-to-run variation. While other sources of variation, such as operating system jitter, have been investigated before, this doctoral thesis specifically focuses on inter-application interference and studies it from the perspective of an application. Variation in execution time not only causes uncertainty and affects user expectations (especially during performance analysis), but also causes suboptimal usage of HPC resources. Therefore, this thesis aims to evaluate inter-application interference, establish trends among applications under contention, and approximate the impact of external influences on the runtime of an application. To this end, this thesis first presents a method to correlate the performance of applications running side-by-side. The method divides the runtime of a system into globally synchronized, fine-grained time slices for which application performance data is recorded separately. The evaluation of the method demonstrates that correlating application performance data can identify inter-application interference. The thesis further uses the method to study I/O interference and shows that file access patterns are a significant factor in determining the interference potential of an application. This thesis also presents a technique to estimate the impact of external influences on an application run. The technique introduces the concept of intrinsic performance characteristics to cluster similar application execution segments. Anomalies in the cluster are the result of external interference. An evaluation with several benchmarks shows high accuracy in estimating the impact of interference from a single application run. The contributions of this thesis will help establish interference trends and devise interference mitigation techniques. Similarly, estimating the impact of external interference will restore user expectations and help performance analysts separate application performance from external influence

    Distributed Computing Framework Based on Software Containers for Heterogeneous Embedded Devices

    Get PDF
    The Internet of Things (IoT) is represented by millions of everyday objects enhanced with sensing and actuation capabilities that are connected to the Internet. Traditional approaches for IoT applications involve sending data to cloud servers for processing and storage, and then relaying commands back to devices. However, this approach is no longer feasible due to the rapid growth of IoT in the network: the vast amount of devices causes congestion; latency and security requirements demand that data is processed close to the devices that produce and consume it; and the processing and storage resources of devices remain underutilized. Fog Computing has emerged as a new paradigm where multiple end-devices form a shared pool of resources where distributed applications are deployed, taking advantage of local capabilities. These devices are highly heterogeneous, with varying hardware and software platforms. They are also resource-constrained, with limited availability of processing and storage resources. Realizing the Fog requires a software framework that simplifies the deployment of distributed applications, while at the same time overcoming these constraints. In Cloud-based deployments, software containers provide a lightweight solution to simplify the deployment of distributed applications. However, Cloud hardware is mostly homogeneous and abundant in resources. This work establishes the feasibility of using Docker Swarm -- an existing container-based software framework -- for the deployment of distributed applications on IoT devices. This is realized with the use of custom tools to enable minimal-size applications compatible with heterogeneous devices; automatic configuration and formation of device Fog; remote management and provisioning of devices. The proposed framework has significant advantages over the state of the art, namely, it supports Fog-based distributed applications, it overcomes device heterogeneity and it simplifies device initialization

    Dependability analysis of parallel systems using a simulation-based approach

    Get PDF
    The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches

    2015 Summer Research Symposium Abstract Book

    Get PDF
    2015 Summer volume of abstracts for science research projects conducted by students at Trinity College

    A framework for Traffic Engineering in software-defined networks with advance reservation capabilities

    Get PDF
    298 p.En esta tesis doctoral se presenta una arquitectura software para facilitar la introducción de técnicas de ingeniería de tráfico en redes definidas por software. La arquitectura ha sido diseñada de forma modular, de manera que soporte múltiples casos de uso, incluyendo su aplicación en redes académicas. Cabe destacar que las redes académicas se caracterizan por proporcionar servicios de alta disponibilidad, por lo que la utilización de técnicas de ingeniería de tráfico es de vital importancia a fin de garantizar la prestación del servicio en los términos acordados. Uno de los servicios típicamente prestados por las redes académicas es el establecimiento de circuitos extremo a extremo con una duración determinada en la que una serie de recursos de red estén garantizados, conocido como ancho de banda bajo demanda, el cual constituye uno de los casos de uso en ingeniería de tráfico más desafiantes. Como consecuencia, y dado que esta tesis doctoral ha sido co-financiada por la red académica GÉANT, la arquitectura incluye soporte para servicios de reserva avanzada. La solución consiste en una gestión de los recursos de red en función del tiempo, la cual mediante el empleo de estructuras de datos y algoritmos específicamente diseñados persigue la mejora de la utilización de los recursos de red a la hora de prestar este tipo de servicios. La solución ha sido validada teniendo en cuenta los requisitos funcionales y de rendimiento planteados por la red GÉANT. Así mismo, cabe destacar que la solución será utilizada en el despliegue piloto del nuevo servicio de ancho de banda bajo demanda de la red GÉANT a finales del 2017

    Fuzzy Logic

    Get PDF
    The capability of Fuzzy Logic in the development of emerging technologies is introduced in this book. The book consists of sixteen chapters showing various applications in the field of Bioinformatics, Health, Security, Communications, Transportations, Financial Management, Energy and Environment Systems. This book is a major reference source for all those concerned with applied intelligent systems. The intended readers are researchers, engineers, medical practitioners, and graduate students interested in fuzzy logic systems

    Exploring Wireless Data Center Networks: Can They Reduce Energy Consumption While Providing Secure Connections?

    Get PDF
    Data centers have become the digital backbone of the modern world. To support the growing demands on bandwidth, Data Centers consume an increasing amount of power. A significant portion of that power is consumed by information technology (IT) equipment, including servers and networking components. Additionally, the complex cabling in traditional data centers poses design and maintenance challenges and increases the energy cost of the cooling infrastructure by obstructing the flow of chilled air. Hence, to reduce the power consumption of the data centers, we proposed a wireless server-to-server data center network architecture using millimeter-wave links to eliminate the need for power-hungry switching fabric of traditional fat-tree-based data center networks. The server-to-server wireless data center network (S2S-WiDCN) architecture requires Line-of-Sight (LoS) between servers to establish direct communication links. However, in the presence of interference from internal or external sources, or an obstruction, such as an IT technician, the LoS may be blocked. To address this issue, we also propose a novel obstruction-aware adaptive routing algorithm for S2S-WiDCN. S2S-WiDCN can reduce the power consumption of the data center network portion while not affecting the power consumption of the servers in the data center, which contributes significantly towards the total power consumption of the data center. Moreover, servers in data centers are almost always underutilized due to over-provisioning, which contributes heavily toward the high-power consumption of the data centers. To address the high power consumption of the servers, we proposed a network-aware bandwidth-constrained server consolidation algorithm called Network-Aware Server Consolidation (NASCon) for wireless data centers that can reduce the power consumption up to 37% while improving the network performance. However, due to the arrival of new tasks and the completion of existing tasks, the consolidated utilization profile of servers change, which may have an adverse effect on overall power consumption over time. To overcome this, NASCon algorithm needs to be executed periodically. We have proposed a mathematical model to estimate the optimal inter-consolidation time, which can be used by the data center resource management unit for scheduling NASCon consolidation operation in real-time and leverage the benefits of server consolidation. However, in any data center environment ensuring security is one of the highest design priorities. Hence, for S2S-WiDCN to become a practical and viable solution for data center network design, the security of the network has to be ensured. S2S-WiDCN data center can be vulnerable to a variety of different attacks as it uses wireless links over an unguided channel for communication. As being a wireless system, the network has to be secured against common threats associated with any wireless networks such as eavesdropping attack, denial of services attack, and jamming attack. In parallel, other security threats such as the attack on the control plane, side-channel attack through traffic analysis are also possible. We have done an extensive study to elaborate the scope of these attacks as well as explore probable solutions against these issues. We also proposed viable solutions for the attack against eavesdropping, denial of services, jamming, and control-plane attack. To address the traffic analysis attack, we proposed a simulated annealing-based random routing mechanism which can be adopted instead of default routing in the wireless data center
    corecore