31 research outputs found

    Virtual InfiniBand Clusters for HPC Clouds

    Get PDF
    High Performance Computing (HPC) employs fast interconnect technologies to provide low communication and synchronization latencies for tightly coupled parallel compute jobs. Contemporary HPC clusters have a xed capacity and static runtime environments; they cannot elastically adapt to dynamic workloads, and provide a limited selection of applications, libraries, and system software. In contrast, a cloud model for HPC clusters promises more exibility, as it provides elastic virtual clusters to be available on-demand. This is not possible with physically owned clusters. In this paper, we present an approach that makes it possible to use InfiniBand clusters for HPC cloud computing. We propose a performance-driven design of an HPC IaaS layer for In niBand, which provides throughput and latency-aware virtualization of nodes, networks, and network topologies, as well as an approach to an HPC-aware, multi-tenant cloud management system for elastic virtualized HPC compute clusters

    Towards Low-Latency Byzantine Agreement Protocols Using RDMA

    Get PDF
    Byzantine fault tolerance (BFT) protocols can mitigate attacks and errors and are increasingly investigated as consensus protocols in blockchains. However, they are traditionally considered costly in terms of message complexity and latency due to the required multiple rounds of message exchanges. With the availability of Remote Direct Memory Access (RDMA) in data centers, message exchange latency can be reduced compared to TCP, as RDMA enables kernel bypassing and thereby avoids intermediate data copying. Retaining the performance benefits for RDMA during its integration, however, is non-trivial and error-prone. While the use of RDMA has previously been explored for key/value stores, databases and distributed file systems, agreement protocols especially for BFT have so far been neglected. We investigate the usage of RDMA in the Reptor BFT protocol for low-latency agreement and show first steps towards an RDMA-enabled consensus protocol. For this, we present RUBIN, a framework offering similar functionality to the Java NIO selector, which can handle multiple network connections efficiently with a single thread and is employed in several BFT protocol implementations such as BFT-SMART and UpRight

    Researching methods for efficient hardware specification, design and implementation of a next generation communication architecture

    Full text link
    The objective of this work is to create and implement a System Area Network (SAN) architecture called EXTOLL embedded in the current world of systems, software and standards based on the experiences obtained during the ATOLL project development and test. The topics of this work also cover system design methodology and educational issues in order to provide appropriate human resources and work premises. The scope of this work in the EXTOLL SAN project was: • the Xbar architecture and routing (multi-layer routing, virtual channels and their arbitration, routing formats, dead lock aviodance, debug features, automation of reuse) • the on-chip module communication architecture and parts of the host communication • the network processor architecture and integration • the development of the design methodology and the creation of the design flow • the team education and work structure. In order to successfully leverage student know-how and work flow methodology for this research project the SEED curricula changes has been governed by the Hochschul Didaktik Zentrum resulting in a certificate for "Hochschuldidaktik" and excellence in university education. The complexity of the target system required new approaches in concurrent Hardware/Software codesign. The concept of virtual hardware prototypes has been established and excessively used during design space exploration and software interface design

    Offloading Safety- and Mission-Critical Tasks via Unreliable Connections

    Get PDF
    For many cyber-physical systems, e.g., IoT systems and autonomous vehicles, offloading workload to auxiliary processing units has become crucial. However, since this approach highly depends on network connectivity and responsiveness, typically only non-critical tasks are offloaded, which have less strict timing requirements than critical tasks. In this work, we provide two protocols allowing to offload critical and non-critical tasks likewise, while providing different service levels for non-critical tasks in the event of an unsuccessful offloading operation, depending on the respective system requirements. We analyze the worst-case timing behavior of the local cyber-physical system and, based on these analyses, we provide a sufficient schedulability test for each of the proposed protocols. In the course of comprehensive experiments, we show that our protocols have reasonable acceptance ratios under the provided schedulability tests. Moreover, we demonstrate that the system behavior under our proposed protocols is strongly dependent on probability of unsuccessful offloading operations, the percentage of critical tasks in the system, and the amount of offloaded workload

    Implementation and comparison of iSCSI over RDMA

    Get PDF
    iSCSI is an emerging storage network technology that allows for block-level access to disk drives over a computer network. Since iSCSI runs over the very ubiquitous TCP/IP protocol it has many advantages over its more proprietary alternatives. Due to the recent movement toward 10 gigabit Ethernet, storage vendors are interested to see how this large increase in network bandwidth could benefit the iSCSI protocol. In order to make full use of the bandwidth provided by a 10 gigabit Ethernet link, specialized Remote Direct Memory Access hardware is being developed to offload processing and reduce the data-copy-overhead found in a standard TCP/IP network stack. This thesis focuses on the development of an iSCSI implementation that is capable of supporting this new hardware and the evaluation of its performance. This thesis depicts the approach used to implement the iSCSI Extensions for Remote Direct Memory Access (iSER) with the UNH iSCSI reference implementation. This approach involves a three step process: moving UNH-iSCSI from the Linux kernel to the Linux user-space, adding support for the iSER extensions to our user-space iSCSI and finally moving everything back into the Linux kernel. In addition to a description of the implementation, results are given that demonstrate the performance of the completed iSER-assisted iSCSI implementation

    Networking research in front ending and intelligent terminals : management plan

    Get PDF
    Prepared for the Command and Control Technical Center, WWMCCS ADP Directorate, Defense Communication Agency, under contract DCA 100-76-C-0088."CCTC-WAD document no. 6509.

    Cutting Wi-Fi Scan Tax for Smart Devices

    Get PDF
    Today most popular mobile apps and location-based services require near always-on Wi-Fi connectivity (e.g., Skype, Viber, Wi-Fi Finder). The Wi-Fi power drain resulting from frequent Wi-Fi active scans is undermining the battery performance of smart devices and causing users to remove apps or disable important services. We collectively call this the scan tax problem. The main reason for this problem is that the main processor has to be active during Wi-Fi active scans and hence consumes a significant and disproportionate amount of energy during scan periods. We propose a simple and effective architectural change, where the main processor periodically computes an SSID list and scan parameters (i.e. scan interval, timeout) taking into account user mobility and behavior (e.g. walking); allowing scan to be offloaded to the Wi-Fi radio. We design WiScan, a complete system to realize scan offloading, and implement our system on the Nexus 5. Both our prototype experiments and trace-driven emulations demonstrate that WiScan achieves 90%+ of the maximal connectivity (connectivity that the existing Wi-Fi scan mechanism could achieve with 5 seconds scan interval), while saving 50-62% energy for seeking connectivity (the ratio between the Wi-Fi connected duration and total time duration) compared to existing active scan implementations. We argue that our proposed shift not only significantly reduces the scan tax paid by users, but also ultimately leads to ultra-low power, always-on Wi-Fi connectivity enabling a new class of context-aware apps to emerge

    Communication Awareness

    Get PDF

    A SOFTWARE DEFINED NETWORKING ARCHITECTURE FOR HIGH PERFORMANCE CLOUDS 1

    Get PDF
    ABSTRACT-Multi-tenant clouds with resource virtualization offer elasticity of resources and elimination of initial cluster setup cost and time for applications. However, poor network performance, performance variation and noisy neighbors are some of the challenges for execution of high performance applications on public clouds. Utilizing these virtualized resources for scientific applications, which have complex communication patterns, require low latency communication mechanisms and a rich set of communication constructs. To minimize the virtualization overhead, a novel approach for low latency networking for HPC Clouds is proposed and implemented over a multi-technology software defined network. The efficiency of the proposed low-latency SDN is analyzed and evaluated for high performance applications. The results of the experiments show that the latest Mellanox FDR InfiniBand interconnect and Mellanox OpenStack plugin gives the best performance for implementing virtual machine based high performance clouds with large message sizes
    corecore