91 research outputs found
Revisiting the high-performance reconfigurable computing for future datacenters
Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects-discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well
Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks
The performance of large-scale computing systems often critically depends on
high-performance communication networks. Dynamically reconfigurable topologies,
e.g., based on optical circuit switches, are emerging as an innovative new
technology to deal with the explosive growth of datacenter traffic.
Specifically, periodic reconfigurable datacenter networks (RDCNs) such as
RotorNet (SIGCOMM 2017), Opera (NSDI 2020) and Sirius (SIGCOMM 2020) have been
shown to provide high throughput, by emulating a complete graph through fast
periodic circuit switch scheduling.
However, to achieve such a high throughput, existing reconfigurable network
designs pay a high price: in terms of potentially high delays, but also, as we
show as a first contribution in this paper, in terms of the high buffer
requirements. In particular, we show that under buffer constraints, emulating
the high-throughput complete-graph is infeasible at scale, and we uncover a
spectrum of unvisited and attractive alternative RDCNs, which emulate regular
graphs of lower node degree.
We present Mars, a periodic reconfigurable topology which emulates a
-regular graph with near-optimal throughput. In particular, we
systematically analyze how the degree can be optimized for throughput given
the available buffer and delay tolerance of the datacenter
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics
The diversity of workload requirements and increasing hardware heterogeneity
in emerging high performance computing (HPC) systems motivate resource
disaggregation. Resource disaggregation allows compute and memory resources to
be allocated individually as required to each workload. However, it is unclear
how to efficiently realize this capability and cost-effectively meet the
stringent bandwidth and latency requirements of HPC applications. To that end,
we describe how modern photonics can be co-designed with modern HPC racks to
implement flexible intra-rack resource disaggregation and fully meet the bit
error rate (BER) and high escape bandwidth of all chip types in modern HPC
racks. Our photonic-based disaggregated rack provides an average application
speedup of 11% (46% maximum) for 25 CPU and 61% for 24 GPU benchmarks compared
to a similar system that instead uses modern electronic switches for
disaggregation. Using observed resource usage from a production system, we
estimate that an iso-performance intra-rack disaggregated HPC system using
photonics would require 4x fewer memory modules and 2x fewer NICs than a
non-disaggregated baseline.Comment: 15 pages, 12 figures, 4 tables. Published in IEEE Cluster 202
Container-based load balancing for energy efficiency in software-defined edge computing environment
The workload generated by the Internet of Things (IoT)-based infrastructure is often handled by the cloud data centers (DCs). However, in recent time, an exponential increase in the deployment of the IoT-based infrastructure has escalated the workload on the DCs. So, these DCs are not fully capable to meet the strict demand of IoT devices in regard to the lower latency as well as high data rate while provisioning IoT workloads. Therefore, to reinforce the latency-sensitive workloads, an intersection layer known as edge computing has successfully balanced the entire service provisioning landscape. In this IoT-edge-cloud ecosystem, large number of interactions and data transmissions among different layer can increase the load on underlying network infrastructure. So, software-defined edge computing has emerged as a viable solution to resolve these latency-sensitive workload issues. Additionally, energy consumption has been witnessed as a major challenge in resource-constrained edge systems. The existing solutions are not fully compatible in Software-defined Edge ecosystem for handling IoT workloads with an optimal trade-off between energy-efficiency and latency. Hence, this article proposes a lightweight and energy-efficient container-as-a-service (CaaS) approach based on the software-define edge computing to provision the workloads generated from the latency-sensitive IoT applications. A Stackelberg game is formulated for a two-period resource allocation between end-user/IoT devices and Edge devices considering the service level agreement. Furthermore, an energy-efficient ensemble for container allocation, consolidation and migration is also designed for load balancing in software-defined edge computing environment. The proposed approach is validated through a simulated environment with respect to CPU serve time, network serve time, overall delay, lastly energy consumption. The results obtained show the superiority of the proposed in comparison to the existing variants
Dataplane Specialization for High-performance OpenFlow Software Switching
OpenFlow is an amazingly expressive dataplane program-
ming language, but this expressiveness comes at a severe
performance price as switches must do excessive packet clas-
sification in the fast path. The prevalent OpenFlow software
switch architecture is therefore built on flow caching, but
this imposes intricate limitations on the workloads that can
be supported efficiently and may even open the door to mali-
cious cache overflow attacks. In this paper we argue that in-
stead of enforcing the same universal flow cache semantics
to all OpenFlow applications and optimize for the common
case, a switch should rather automatically specialize its dat-
aplane piecemeal with respect to the configured workload.
We introduce ES WITCH , a novel switch architecture that
uses on-the-fly template-based code generation to compile
any OpenFlow pipeline into efficient machine code, which
can then be readily used as fast path. We present a proof-
of-concept prototype and we demonstrate on illustrative use
cases that ES WITCH yields a simpler architecture, superior
packet processing speed, improved latency and CPU scala-
bility, and predictable performance. Our prototype can eas-
ily scale beyond 100 Gbps on a single Intel blade even with
complex OpenFlow pipelines
- …