91 research outputs found

    Revisiting the high-performance reconfigurable computing for future datacenters

    Get PDF
    Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects-discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well

    Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks

    Full text link
    The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new technology to deal with the explosive growth of datacenter traffic. Specifically, periodic reconfigurable datacenter networks (RDCNs) such as RotorNet (SIGCOMM 2017), Opera (NSDI 2020) and Sirius (SIGCOMM 2020) have been shown to provide high throughput, by emulating a complete graph through fast periodic circuit switch scheduling. However, to achieve such a high throughput, existing reconfigurable network designs pay a high price: in terms of potentially high delays, but also, as we show as a first contribution in this paper, in terms of the high buffer requirements. In particular, we show that under buffer constraints, emulating the high-throughput complete-graph is infeasible at scale, and we uncover a spectrum of unvisited and attractive alternative RDCNs, which emulate regular graphs of lower node degree. We present Mars, a periodic reconfigurable topology which emulates a dd-regular graph with near-optimal throughput. In particular, we systematically analyze how the degree dd can be optimized for throughput given the available buffer and delay tolerance of the datacenter

    Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics

    Full text link
    The diversity of workload requirements and increasing hardware heterogeneity in emerging high performance computing (HPC) systems motivate resource disaggregation. Resource disaggregation allows compute and memory resources to be allocated individually as required to each workload. However, it is unclear how to efficiently realize this capability and cost-effectively meet the stringent bandwidth and latency requirements of HPC applications. To that end, we describe how modern photonics can be co-designed with modern HPC racks to implement flexible intra-rack resource disaggregation and fully meet the bit error rate (BER) and high escape bandwidth of all chip types in modern HPC racks. Our photonic-based disaggregated rack provides an average application speedup of 11% (46% maximum) for 25 CPU and 61% for 24 GPU benchmarks compared to a similar system that instead uses modern electronic switches for disaggregation. Using observed resource usage from a production system, we estimate that an iso-performance intra-rack disaggregated HPC system using photonics would require 4x fewer memory modules and 2x fewer NICs than a non-disaggregated baseline.Comment: 15 pages, 12 figures, 4 tables. Published in IEEE Cluster 202

    Container-based load balancing for energy efficiency in software-defined edge computing environment

    Get PDF
    The workload generated by the Internet of Things (IoT)-based infrastructure is often handled by the cloud data centers (DCs). However, in recent time, an exponential increase in the deployment of the IoT-based infrastructure has escalated the workload on the DCs. So, these DCs are not fully capable to meet the strict demand of IoT devices in regard to the lower latency as well as high data rate while provisioning IoT workloads. Therefore, to reinforce the latency-sensitive workloads, an intersection layer known as edge computing has successfully balanced the entire service provisioning landscape. In this IoT-edge-cloud ecosystem, large number of interactions and data transmissions among different layer can increase the load on underlying network infrastructure. So, software-defined edge computing has emerged as a viable solution to resolve these latency-sensitive workload issues. Additionally, energy consumption has been witnessed as a major challenge in resource-constrained edge systems. The existing solutions are not fully compatible in Software-defined Edge ecosystem for handling IoT workloads with an optimal trade-off between energy-efficiency and latency. Hence, this article proposes a lightweight and energy-efficient container-as-a-service (CaaS) approach based on the software-define edge computing to provision the workloads generated from the latency-sensitive IoT applications. A Stackelberg game is formulated for a two-period resource allocation between end-user/IoT devices and Edge devices considering the service level agreement. Furthermore, an energy-efficient ensemble for container allocation, consolidation and migration is also designed for load balancing in software-defined edge computing environment. The proposed approach is validated through a simulated environment with respect to CPU serve time, network serve time, overall delay, lastly energy consumption. The results obtained show the superiority of the proposed in comparison to the existing variants

    Dataplane Specialization for High-performance OpenFlow Software Switching

    Get PDF
    OpenFlow is an amazingly expressive dataplane program- ming language, but this expressiveness comes at a severe performance price as switches must do excessive packet clas- sification in the fast path. The prevalent OpenFlow software switch architecture is therefore built on flow caching, but this imposes intricate limitations on the workloads that can be supported efficiently and may even open the door to mali- cious cache overflow attacks. In this paper we argue that in- stead of enforcing the same universal flow cache semantics to all OpenFlow applications and optimize for the common case, a switch should rather automatically specialize its dat- aplane piecemeal with respect to the configured workload. We introduce ES WITCH , a novel switch architecture that uses on-the-fly template-based code generation to compile any OpenFlow pipeline into efficient machine code, which can then be readily used as fast path. We present a proof- of-concept prototype and we demonstrate on illustrative use cases that ES WITCH yields a simpler architecture, superior packet processing speed, improved latency and CPU scala- bility, and predictable performance. Our prototype can eas- ily scale beyond 100 Gbps on a single Intel blade even with complex OpenFlow pipelines
    • …
    corecore