389 research outputs found
Scaling PULSE Data Center Network Architecture and Scheduling Optical Circuits in Sub-Microseconds
PULSE, an optical circuit switched data center network, employs custom ASIC schedulers to reconfigure circuits in 240 ns. The revised PULSE architecture scales to 10,000s blades, achieves >95% sustained throughput, with low median (1.23 µs) and tail (145 µs) latencies, while consuming 115 pJ/bit and costing $9.04/Gbps
A Hybrid Beam Steering Free-Space and Fiber Based Optical Data Center Network
Wireless data center networks (DCNs) are promising solutions to mitigate the cabling complexity in traditional wired DCNs and potentially reduce the end-to-end latency with faster propagation speed in free space. Yet, physical architectures in wireless DCNs must be carefully designed regarding wireless link blockage, obstacle bypassing, path loss, interference and spatial efficiency in a dense deployment. This paper presents the physical layer design of a hybrid FSO/in-fiber DCN while guaranteeing an all-optical, single hop, non-oversubscribed and full-bisection bandwidth network. We propose two layouts and analyze their scalability: (1) A static network utilizing only tunable sources which can scale up to 43 racks, 15,609 nodes and 15,609 channels; and (2) a re-configurable network with both tunable sources and piezoelectric actuator (PZT) based beam-steering which can scale up to 8 racks, 2,904 nodes and 185,856 channels at millisecond PZT switching time. Based on a traffic generation framework and a dynamic wavelength-timeslot scheduling algorithm, the system-level network performance is simulated for a 363-node subnet, reaching >99% throughput and 1.23 μ s average scheduler latency at 90% load
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Distributed deep learning (DDL) systems strongly depend on network
performance. Current electronic packet switched (EPS) network architectures and
technologies suffer from variable diameter topologies, low-bisection bandwidth
and over-subscription affecting completion time of communication and collective
operations.
We introduce a near-exascale, full-bisection bandwidth, all-to-all,
single-hop, all-optical network architecture with nanosecond reconfiguration
called RAMP, which supports large-scale distributed and parallel computing
systems (12.8~Tbps per node for up to 65,536 nodes).
For the first time, a custom RAMP-x MPI strategy and a network transcoder is
proposed to run MPI collective operations across the optical circuit switched
(OCS) network in a schedule-less and contention-less manner. RAMP achieves
7.6-171 speed-up in completion time across all MPI operations compared
to realistic EPS and OCS counterparts. It can also deliver a 1.3-16 and
7.8-58 reduction in Megatron and DLRM training time respectively} while
offering 42-53 and 3.3-12.4 improvement in energy consumption
and cost respectively
Control Plane Hardware Design for Optical Packet Switched Data Centre Networks
Optical packet switching for intra-data centre networks is key to addressing traffic requirements. Photonic integration and wavelength division multiplexing (WDM) can overcome bandwidth limits in switching systems. A promising technology to build a nanosecond-reconfigurable photonic-integrated switch, compatible with WDM, is the semiconductor optical amplifier (SOA). SOAs are typically used as gating elements in a broadcast-and-select (B\&S) configuration, to build an optical crossbar switch. For larger-size switching, a three-stage Clos network, based on crossbar nodes, is a viable architecture. However, the design of the switch control plane, is one of the barriers to packet switching; it should run on packet timescales, which becomes increasingly challenging as line rates get higher. The scheduler, used for the allocation of switch paths, limits control clock speed. To this end, the research contribution was the design of highly parallel hardware schedulers for crossbar and Clos network switches. On a field-programmable gate array (FPGA), the minimum scheduler clock period achieved was 5.0~ns and 5.4~ns, for a 32-port crossbar and Clos switch, respectively. By using parallel path allocation modules, one per Clos node, a minimum clock period of 7.0~ns was achieved, for a 256-port switch. For scheduler application-specific integrated circuit (ASIC) synthesis, this reduces to 2.0~ns; a record result enabling scalable packet switching. Furthermore, the control plane was demonstrated experimentally. Moreover, a cycle-accurate network emulator was developed to evaluate switch performance. Results showed a switch saturation throughput at a traffic load 60\% of capacity, with sub-microsecond packet latency, for a 256-port Clos switch, outperforming state-of-the-art optical packet switches
Recommended from our members
Hardware-Software Integrated Silicon Photonic Systems
Fabrication of integrated photonic devices and circuits in a CMOS-compatible process or foundry is the essence of the silicon photonic platform. Optical devices in this platform are enabled by the high index contrast between silicon and silicon on insulator. These devices offer potential benefits when integrated with existing and emerging high performance microelectronics. Integration of silicon photonics with small footprints and power-efficient and high-bandwidth operation has long been cited as a solution to existing issues in high performance interconnects for telecommunications and data communication. Stemming from this historic application in communications, new applications in sensing arrays, biochemistry, and even entertainment continue to grow. However, for many technologies to successfully adopt silicon photonics and reap the perceived benefits, the silicon photonic platform must extend toward development of a full ecosystem. Such extension includes implementation of low cost and robust electronic-photonic packaging techniques for all applications. In an ecosystem implemented with services ranging from device fabrication all the way to packaged products, ease-of-use and ease-of-deployment in systems that require many hardware and software components becomes possible.
With the onset of the Internet of Things (IoT), nearly all technologies—sensors, compute, communication devices, etc.—persist in systems with some level of localized or distributed software interaction. These interactions often require a level of networked communications. For silicon photonics to penetrate technologies comprising IoT, it is advantageous to implement such devices in a hardware-software integrated way. Meaning, all functionalities and interactions related to the silicon photonic devices are well defined in terms of the physicality of the hardware. This hardware is then abstracted into various levels of software as needed in the system. The power of hardware-software integration allows many of the piece-wise demonstrated functionalities of silicon photonics to easily translate to commercial implementation.
This work begins by briefly highlighting the challenges and solutions for transforming existing silicon photonic platforms to a full-fledged silicon photonic ecosystem. The highlighted solutions in development consist of tools for fabrication, testing, subsystem packaging, and system validation. Building off the knowledge of a silicon photonic ecosystem in development, this work continues by demonstrating various levels of hardware-software integration. These are primarily focused on silicon photonic interconnects.
The first hardware-software integration-focused portion of this work explores silicon microring-based devices as a key building block for greater silicon photonic subsystems. The microring’s sensitivity to thermal fluctuations is identified not as a flaw, but as a tool for functionalization. A logical control system is implemented to mitigate thermal effects that would normally render a microring resonator inoperable. The mechanism to control the microring is extended and abstracted with software programmability to offer wavelength routing as a network primitive. This functionality, available through hardware-software integration, offers the possibility for ubiquitous deployment of such microring devices in future photonic interconnection networks.
The second hardware-software integration-focused portion of this work explores dynamic silicon photonic switching devices and circuits. Specifically, interactions with and implications of high-speed data propagation and link layer control are demonstrated. The characteristics of photonic link setup include transients due to physical layer optical effects, latencies involved with initializing burst mode links, and optical link quality. The impacts on the functionalities and performance offered by photonic devices are explored. An optical network interface platform is devised using FPGAs to encapsulate hardware and software for controlling these characteristics using custom hardware description language, firmware, and software. A basic version of a silicon photonic network controller using FPGAs is used as a tool to demonstrate a highly scalable switch architecture using microring resonators. This architecture would not be possible without some semblance of this controller, combined with advanced electronic-photonic packaging. A more advanced deployment of the network interface platform is used to demonstrate a method for accelerating photonic links using out-of-band arbitration. A first demonstration of this platform is performed on a silicon photonic microring router network. A second demonstration is used to further explore the feasibility of full hardware-software integrated photonic device actuation, link layer control, and out-of-band arbitration. The demonstration is performed on a complete silicon photonic network with both spatial switching and wavelength routing functionalities.
The aforementioned hardware-software integration mechanisms are rigorously tested for data communications applications. Capabilities are shown for very reliable, low latency, and dynamic high-speed data delivery using silicon photonic devices. Applying these mechanisms to complete electronic-photonic packaged subsystems provides a strong path to commercial manifestations of functional silicon photonic devices
Clock Synchronisation Assisted Clock and Data Recovery for Sub-Nanosecond Data Centre Optical Switching
In current `Cloud' data centres, switching of data between servers is performed using deep hierarchies of interconnected electronic packet switches. Demand for network bandwidth from emerging data centre workloads, combined with the slowing of silicon transistor scaling, is leading to a widening gap between data centre traffic demand and electronically-switched data centre network capacity. All-optical switches could offer a future-proof alternative, with potentially under a third of the power consumption and cost of electronically-switched networks. However, the effective bandwidth of optical switches depends on their overall switching time. This is dominated by the clock and data recovery (CDR) locking time, which takes hundreds of nanoseconds in commercial receivers. Current data centre traffic is dominated by small packets that transmit in tens of nanoseconds, leading to low effective bandwidth, as a high proportion of receiver time is spent performing CDR locking instead of receiving data, removing the benefits of optical switching. High-performance optical switching requires sub-nanosecond CDR locking time to overcome this limitation. This thesis proposes, models, and demonstrates clock synchronisation assisted CDR, which can achieve this. This approach uses clock synchronisation to simplify the complexity of CDR versus previous asynchronous approaches. An analytical model of the technique is first derived that establishes its potential viability. Following this, two approaches to clock synchronisation assisted CDR are investigated: 1. Clock phase caching, which uses clock phase storage and regular updates in a 2km intra-building scale data centre network interconnected by single-mode optical fibre. 2. Single calibration clock synchronisation assisted CDR}, which leverages the 20 times lower thermal sensitivity of hollow core optical fibre versus single-mode fibre to synchronise a 100m cluster scale data centre network, with a single initial phase calibration step. Using a real-time FPGA-based optical switch testbed, sub-nanosecond CDR locking time was demonstrated for both approaches
Optical Switching for Scalable Data Centre Networks
This thesis explores the use of wavelength tuneable transmitters and control systems within the context of scalable, optically switched data centre networks. Modern data centres require innovative networking solutions to meet their growing power, bandwidth, and scalability requirements. Wavelength routed optical burst switching (WROBS) can meet these demands by applying agile wavelength tuneable transmitters at the edge of a passive network fabric. Through experimental investigation of an example WROBS network, the transmitter is shown to determine system performance, and must support ultra-fast switching as well as power efficient transmission. This thesis describes an intelligent optical transmitter capable of wideband sub-nanosecond wavelength switching and low-loss modulation. A regression optimiser is introduced that applies frequency-domain feedback to automatically enable fast tuneable laser reconfiguration. Through simulation and experiment, the optimised laser is shown to support 122×50 GHz channels, switching in less than 10 ns. The laser is deployed as a component within a new wavelength tuneable source (WTS) composed of two time-interleaved tuneable lasers and two semiconductor optical amplifiers. Switching over 6.05 THz is demonstrated, with stable switch times of 547 ps, a record result. The WTS scales well in terms of chip-space and bandwidth, constituting the first demonstration of scalable, sub-nanosecond optical switching. The power efficiency of the intelligent optical transmitter is further improved by introduction of a novel low-loss split-carrier modulator. The design is evaluated using 112 Gb/s/λ intensity modulated, direct-detection signals and a single-ended photodiode receiver. The split-carrier transmitter is shown to achieve hard decision forward error correction ready performance after 2 km of transmission using a laser output power of just 0 dBm; a 5.2 dB improvement over the conventional transmitter. The results achieved in the course of this research allow for ultra-fast, wideband, intelligent optical transmitters that can be applied in the design of all-optical data centres for power efficient, scalable networking
Design of a fault tolerant airborne digital computer. Volume 2: Computational requirements and technology
This final report summarizes the work on the design of a fault tolerant digital computer for aircraft. Volume 2 is composed of two parts. Part 1 is concerned with the computational requirements associated with an advanced commercial aircraft. Part 2 reviews the technology that will be available for the implementation of the computer in the 1975-1985 period. With regard to the computation task 26 computations have been categorized according to computational load, memory requirements, criticality, permitted down-time, and the need to save data in order to effect a roll-back. The technology part stresses the impact of large scale integration (LSI) on the realization of logic and memory. Also considered was module interconnection possibilities so as to minimize fault propagation
A Cost and Power Feasibility Analysis of Quantum Annealing for NextG Cellular Wireless Networks
In order to meet mobile cellular users' ever-increasing data demands, today's 4 G and 5 G wireless networks are designed mainly with the goal of maximizing spectral efficiency. While they have made progress in this regard, controlling the carbon footprint and operational costs of such networks remains a long-standing problem among network designers. This paper takes a long view on this problem, envisioning a NextG scenario where the network leverages quantum annealing for cellular baseband processing. We gather and synthesize insights on power consumption, computational throughput and latency, spectral efficiency, operational cost, and feasibility timelines surrounding quantum annealing technology. Armed with these data, we project the quantitative performance targets future quantum annealing hardware must meet in order to provide a computational and power advantage over CMOS hardware, while matching its whole-network spectral efficiency. Our quantitative analysis predicts that with 82.32 μ s problem latency and 2.68 M qubits, quantum annealing will achieve a spectral efficiency equal to CMOS while reducing power consumption by 41 kW (45% lower) in a Large MIMO base station with 400 MHz bandwidth and 64 antennas, and a 160 kW power reduction (55% lower) using 8.04 M qubits in a CRAN setting with three Large MIMO base stations
- …