17 research outputs found
Scalable electro-optical solutions for data center networks
Switching gears towards efficient datacenters with photonic
Recommended from our members
High Performance Silicon Photonic Interconnected Systems
Advances in data-driven applications, particularly artificial intelligence and deep learning, are driving the explosive growth of computation and communication in today’s data centers and high-performance computing (HPC) systems. Increasingly, system performance is not constrained by the compute speed at individual nodes, but by the data movement between them. This calls for innovative architectures, smart connectivity, and extreme bandwidth densities in interconnect designs. Silicon photonics technology leverages mature complementary metal-oxide-semiconductor (CMOS) manufacturing infrastructure and is promising for low cost, high-bandwidth, and reconfigurable interconnects. Flexible and high-performance photonic switched architectures are capable of improving the system performance. The work in this dissertation explores various photonic interconnected systems and the associated optical switching functionalities, hardware platforms, and novel architectures. It demonstrates the capabilities of silicon photonics to enable efficient deep learning training.
We first present field programmable gate array (FPGA) based open-loop and closed-loop control for optical spectral-and-spatial switching of silicon photonic cascaded micro-ring resonator (MRR) switches. Our control achieves wavelength locking at the user-defined resonance of the MRR for optical unicast, multicast, and multiwavelength-select functionalities. Digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) are necessary for the control of the switch. We experimentally demonstrate the optical switching functionalities using an FPGA-based switch controller through both traditional multi-bit DAC/ADC and novel single-wired DAC/ADC circuits. For system-level integration, interfaces to the switch controller in a network control plane are developed. The successful control and the switching functionalitiesachieved are essential for system-level architectural innovations as presented in the following sections.
Next, this thesis presents two novel photonic switched architectures using the MRR-based switches. First, a photonic switched memory system architecture was designed to address memory challenges in deep learning. The reconfigurable photonic interconnects provide scalable solutions and enable efficient use of disaggregated memory resources for deep learning training. An experimental testbed was built with a processing system and two remote memory nodes using silicon photonic switch fabrics and system performance improvements were demonstrated. The collective results and existing high-bandwidth optical I/Os show the potential of integrating the photonic switched memory to state-of-the-art processing systems. Second, the scaling trends of deep learning models and distributed training workloads are challenging network capacities in today’s data centers and HPCs. A system architecture that leverages SiP switch-enabled server regrouping is proposed to tackle the challenges and accelerate distributed deep learning training. An experimental testbed with a SiP switch-enabled reconfigurable fat tree topology was built to evaluate the network performance of distributed ring all-reduce and parameter server workloads. We also present system-scale simulations. Server regrouping and bandwidth steering were performed on a large-scale tapered fat tree with 1024 compute nodes to show the benefits of using photonic switched architectures in systems at scale.
Finally, this dissertation explores high-bandwidth photonic interconnect designs for disaggregated systems. We first introduce and discuss two disaggregated architectures leveraging extreme high bandwidth interconnects with optically interconnected computing resources. We present the concept of rack-scale graphics processing unit (GPU) disaggregation with optical circuit switches and electrical aggregator switches. The architecture can leverage the flexibility of high bandwidth optical switches to increase hardware utilization and reduce application runtimes. A testbed was built to demonstrate resource disaggregation and defragmentation. In addition, we also present an extreme high-bandwidth optical interconnect accelerated low-latency communication architecture for deep learning training. The disaggregated architecture utilizes comb laser sources and MRR-based cross-bar switching fabrics to enable an all-to-all high bandwidth communication with a constant latency cost for distributed deep learning training. We discuss emerging technologies in the silicon photonics platform, including light source, transceivers, and switch architectures, to accommodate extreme high bandwidth requirements in HPC and data center environments. A prototype hardware innovation - Optical Network Interface Cards (comprised of FPGA, photonic integrated circuits (PIC), electronic integrated circuits (EIC), interposer, and high-speed printed circuit board (PCB)) is presented to show the path toward fast lanes for expedited execution at 10 terabits.
Taken together, the work in this dissertation demonstrates the capabilities of high-bandwidth silicon photonic interconnects and innovative architectural designs to accelerate deep learning training in optically connected data center and HPC systems
Evaluation of data centre networks and future directions
Traffic forecasts predict a more than threefold increase in the global datacentre workload in coming years, caused by the increasing adoption of cloud and data-intensive applications. Consequently, there has been an unprecedented need for ultra-high throughput and minimal latency. Currently deployed hierarchical architectures using electronic packet switching technologies are costly and energy-inefficient. Very high capacity switches are required to satisfy the enormous bandwidth requirements of cloud datacentres and this limits the overall network scalability. With the maturity of photonic components, turning to optical switching in data centres is a viable option to accommodate greater bandwidth and network flexibility while potentially minimising the latency, cost and power consumption.
Various DCN architectures have been proposed to date and this thesis includes a comparative analysis of such electronic and optical topologies to judge their suitability based on network performance parameters and cost/energy effectiveness, while identifying the challenges faced by recent DCN infrastructures. An analytical Layer 2 switching model is introduced that can alleviate the simulation scalability problem and evaluate the performance of the underlying DCN architecture. This model is also used to judge the variation in traffic arrival/offloading at the intermediate queueing stages and the findings are used to derive closed form expressions for traffic arrival rates and delay. The results from the simulated network demonstrate the impact of buffering and versubscription and reveal the potential bottlenecks and network design tradeoffs. TCP traffic forms the bulk of current DCN workload and so the designed network is further modified to include TCP flows generated from a realistic traffic generator for assessing the impact of Layer 4 congestion control on the DCN performance with standard TCP and datacentre specific TCP protocols (DCTCP). Optical DCN architectures mostly concentrate on core-tier switching. However, substantial energy saving is possible by introducing optics in the edge tiers. Hence, a new approach to optical switching is introduced using Optical ToR switches which can offer better delay performance than commodity switches of similiar size, while having far less power dissipation. An all-optical topology has been further outlined for the efficient implementation of the optical switch meeting the future scalability demands
Wireless Communication in Data Centers: A Survey
Data centers (DCs) is becoming increasingly an integral part of the computing infrastructures of most enterprises. Therefore, the concept of DC networks (DCNs) is receiving an increased attention in the network research community. Most DCNs deployed today can be classified as wired DCNs as copper and optical fiber cables are used for intra- and inter-rack connections in the network. Despite recent advances, wired DCNs face two inevitable problems; cabling complexity and hotspots. To address these problems, recent research works suggest the incorporation of wireless communication technology into DCNs. Wireless links can be used to either augment conventional wired DCNs, or to realize a pure wireless DCN. As the design spectrum of DCs broadens, so does the need for a clear classification to differentiate various design options. In this paper, we analyze the free space optical (FSO) communication and the 60 GHz radio frequency (RF), the two key candidate technologies for implementing wireless links in DCNs. We present a generic classification scheme that can be used to classify current and future DCNs based on the communication technology used in the network. The proposed classification is then used to review and summarize major research in this area. We also discuss open questions and future research directions in the area of wireless DCs