389 research outputs found

    Scaling PULSE Data Center Network Architecture and Scheduling Optical Circuits in Sub-Microseconds

    Get PDF
    PULSE, an optical circuit switched data center network, employs custom ASIC schedulers to reconfigure circuits in 240 ns. The revised PULSE architecture scales to 10,000s blades, achieves >95% sustained throughput, with low median (1.23 µs) and tail (145 µs) latencies, while consuming 115 pJ/bit and costing $9.04/Gbps

    A Hybrid Beam Steering Free-Space and Fiber Based Optical Data Center Network

    Get PDF
    Wireless data center networks (DCNs) are promising solutions to mitigate the cabling complexity in traditional wired DCNs and potentially reduce the end-to-end latency with faster propagation speed in free space. Yet, physical architectures in wireless DCNs must be carefully designed regarding wireless link blockage, obstacle bypassing, path loss, interference and spatial efficiency in a dense deployment. This paper presents the physical layer design of a hybrid FSO/in-fiber DCN while guaranteeing an all-optical, single hop, non-oversubscribed and full-bisection bandwidth network. We propose two layouts and analyze their scalability: (1) A static network utilizing only tunable sources which can scale up to 43 racks, 15,609 nodes and 15,609 channels; and (2) a re-configurable network with both tunable sources and piezoelectric actuator (PZT) based beam-steering which can scale up to 8 racks, 2,904 nodes and 185,856 channels at millisecond PZT switching time. Based on a traffic generation framework and a dynamic wavelength-timeslot scheduling algorithm, the system-level network performance is simulated for a 363-node subnet, reaching >99% throughput and 1.23 μ s average scheduler latency at 90% load

    RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems

    Full text link
    Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations. We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP, which supports large-scale distributed and parallel computing systems (12.8~Tbps per node for up to 65,536 nodes). For the first time, a custom RAMP-x MPI strategy and a network transcoder is proposed to run MPI collective operations across the optical circuit switched (OCS) network in a schedule-less and contention-less manner. RAMP achieves 7.6-171×\times speed-up in completion time across all MPI operations compared to realistic EPS and OCS counterparts. It can also deliver a 1.3-16×\times and 7.8-58×\times reduction in Megatron and DLRM training time respectively} while offering 42-53×\times and 3.3-12.4×\times improvement in energy consumption and cost respectively

    Control Plane Hardware Design for Optical Packet Switched Data Centre Networks

    Get PDF
    Optical packet switching for intra-data centre networks is key to addressing traffic requirements. Photonic integration and wavelength division multiplexing (WDM) can overcome bandwidth limits in switching systems. A promising technology to build a nanosecond-reconfigurable photonic-integrated switch, compatible with WDM, is the semiconductor optical amplifier (SOA). SOAs are typically used as gating elements in a broadcast-and-select (B\&S) configuration, to build an optical crossbar switch. For larger-size switching, a three-stage Clos network, based on crossbar nodes, is a viable architecture. However, the design of the switch control plane, is one of the barriers to packet switching; it should run on packet timescales, which becomes increasingly challenging as line rates get higher. The scheduler, used for the allocation of switch paths, limits control clock speed. To this end, the research contribution was the design of highly parallel hardware schedulers for crossbar and Clos network switches. On a field-programmable gate array (FPGA), the minimum scheduler clock period achieved was 5.0~ns and 5.4~ns, for a 32-port crossbar and Clos switch, respectively. By using parallel path allocation modules, one per Clos node, a minimum clock period of 7.0~ns was achieved, for a 256-port switch. For scheduler application-specific integrated circuit (ASIC) synthesis, this reduces to 2.0~ns; a record result enabling scalable packet switching. Furthermore, the control plane was demonstrated experimentally. Moreover, a cycle-accurate network emulator was developed to evaluate switch performance. Results showed a switch saturation throughput at a traffic load 60\% of capacity, with sub-microsecond packet latency, for a 256-port Clos switch, outperforming state-of-the-art optical packet switches

    Clock Synchronisation Assisted Clock and Data Recovery for Sub-Nanosecond Data Centre Optical Switching

    Get PDF
    In current `Cloud' data centres, switching of data between servers is performed using deep hierarchies of interconnected electronic packet switches. Demand for network bandwidth from emerging data centre workloads, combined with the slowing of silicon transistor scaling, is leading to a widening gap between data centre traffic demand and electronically-switched data centre network capacity. All-optical switches could offer a future-proof alternative, with potentially under a third of the power consumption and cost of electronically-switched networks. However, the effective bandwidth of optical switches depends on their overall switching time. This is dominated by the clock and data recovery (CDR) locking time, which takes hundreds of nanoseconds in commercial receivers. Current data centre traffic is dominated by small packets that transmit in tens of nanoseconds, leading to low effective bandwidth, as a high proportion of receiver time is spent performing CDR locking instead of receiving data, removing the benefits of optical switching. High-performance optical switching requires sub-nanosecond CDR locking time to overcome this limitation. This thesis proposes, models, and demonstrates clock synchronisation assisted CDR, which can achieve this. This approach uses clock synchronisation to simplify the complexity of CDR versus previous asynchronous approaches. An analytical model of the technique is first derived that establishes its potential viability. Following this, two approaches to clock synchronisation assisted CDR are investigated: 1. Clock phase caching, which uses clock phase storage and regular updates in a 2km intra-building scale data centre network interconnected by single-mode optical fibre. 2. Single calibration clock synchronisation assisted CDR}, which leverages the 20 times lower thermal sensitivity of hollow core optical fibre versus single-mode fibre to synchronise a 100m cluster scale data centre network, with a single initial phase calibration step. Using a real-time FPGA-based optical switch testbed, sub-nanosecond CDR locking time was demonstrated for both approaches

    Optical Switching for Scalable Data Centre Networks

    Get PDF
    This thesis explores the use of wavelength tuneable transmitters and control systems within the context of scalable, optically switched data centre networks. Modern data centres require innovative networking solutions to meet their growing power, bandwidth, and scalability requirements. Wavelength routed optical burst switching (WROBS) can meet these demands by applying agile wavelength tuneable transmitters at the edge of a passive network fabric. Through experimental investigation of an example WROBS network, the transmitter is shown to determine system performance, and must support ultra-fast switching as well as power efficient transmission. This thesis describes an intelligent optical transmitter capable of wideband sub-nanosecond wavelength switching and low-loss modulation. A regression optimiser is introduced that applies frequency-domain feedback to automatically enable fast tuneable laser reconfiguration. Through simulation and experiment, the optimised laser is shown to support 122×50 GHz channels, switching in less than 10 ns. The laser is deployed as a component within a new wavelength tuneable source (WTS) composed of two time-interleaved tuneable lasers and two semiconductor optical amplifiers. Switching over 6.05 THz is demonstrated, with stable switch times of 547 ps, a record result. The WTS scales well in terms of chip-space and bandwidth, constituting the first demonstration of scalable, sub-nanosecond optical switching. The power efficiency of the intelligent optical transmitter is further improved by introduction of a novel low-loss split-carrier modulator. The design is evaluated using 112 Gb/s/λ intensity modulated, direct-detection signals and a single-ended photodiode receiver. The split-carrier transmitter is shown to achieve hard decision forward error correction ready performance after 2 km of transmission using a laser output power of just 0 dBm; a 5.2 dB improvement over the conventional transmitter. The results achieved in the course of this research allow for ultra-fast, wideband, intelligent optical transmitters that can be applied in the design of all-optical data centres for power efficient, scalable networking

    Design of a fault tolerant airborne digital computer. Volume 2: Computational requirements and technology

    Get PDF
    This final report summarizes the work on the design of a fault tolerant digital computer for aircraft. Volume 2 is composed of two parts. Part 1 is concerned with the computational requirements associated with an advanced commercial aircraft. Part 2 reviews the technology that will be available for the implementation of the computer in the 1975-1985 period. With regard to the computation task 26 computations have been categorized according to computational load, memory requirements, criticality, permitted down-time, and the need to save data in order to effect a roll-back. The technology part stresses the impact of large scale integration (LSI) on the realization of logic and memory. Also considered was module interconnection possibilities so as to minimize fault propagation

    A Cost and Power Feasibility Analysis of Quantum Annealing for NextG Cellular Wireless Networks

    Get PDF
    In order to meet mobile cellular users' ever-increasing data demands, today's 4 G and 5 G wireless networks are designed mainly with the goal of maximizing spectral efficiency. While they have made progress in this regard, controlling the carbon footprint and operational costs of such networks remains a long-standing problem among network designers. This paper takes a long view on this problem, envisioning a NextG scenario where the network leverages quantum annealing for cellular baseband processing. We gather and synthesize insights on power consumption, computational throughput and latency, spectral efficiency, operational cost, and feasibility timelines surrounding quantum annealing technology. Armed with these data, we project the quantitative performance targets future quantum annealing hardware must meet in order to provide a computational and power advantage over CMOS hardware, while matching its whole-network spectral efficiency. Our quantitative analysis predicts that with 82.32 μ s problem latency and 2.68 M qubits, quantum annealing will achieve a spectral efficiency equal to CMOS while reducing power consumption by 41 kW (45% lower) in a Large MIMO base station with 400 MHz bandwidth and 64 antennas, and a 160 kW power reduction (55% lower) using 8.04 M qubits in a CRAN setting with three Large MIMO base stations
    • …
    corecore