44 research outputs found

    A 40 Gb/s chip-to-chip interconnect for 8-socket direct connectivity using integrated photonics

    Get PDF
    We present an O-band any-to-any chip-to-chip (C2C) interconnection at 40 Gb/s suitable for up to 8-socket direct connectivity in multi-socket server boards, utilizing integrated low-energy photonics for the transceiver and routing functions. The C2C interconnect exploits an Si-based ring modulator as its transmitter and a co-packaged photodiode/transimpedance amplifier enabled receiver interconnected over an 8 x 8 Si-based arrayed waveguide grating router, allowing for a single-hop flat-topology interconnection between eight nodes. A proof-of-concept demonstration of the C2C interconnect is presented at 25 and 40 Gb/s for eight possible routing scenarios, revealing clear eye diagrams at both data rates with extinction ratios of 4.8 +/- 0.3 and 4.38 +/- 0.31 dB, respectively, among the eight routed signals

    LIONS: An AWGR-Based Low-Latency Optical Switch for High-Performance Computing and Data Centers

    Get PDF
    This paper discusses the architecture of an arrayed waveguide grating router (AWGR)-based low-latency interconnect optical network switch called LIONS, and its different loopback buffering schemes. A proof of concept is demonstrated with a 4 x 4 experimental testbed. A simulator was developed to model the LIONS architecture and was validated by comparing experimentally obtained statistics such as average end-to-end latency with the results produced by the simulator. Considering the complexity and cost in implementing loopback buffers in LIONS, we propose an all-optical negative acknowledgement (AO-NACK) architecture in order to remove the need for loopback buffers. Simulation results for LIONS with AO-NACK architecture and distributed loopback buffer architecture are compared with the performance of the flattened butterfly electrical switching network

    Multi-FSR Silicon Photonic Flex-LIONS Module for Bandwidth-Reconfigurable All-to-All Optical Interconnects

    Get PDF
    This article proposes and experimentally demonstrates the first bandwidth-reconfigurable all-to-all optical interconnects using a multi-Free-Spectral-Ranges (FSR) integrated 8 × 8 SiPh Flex-LIONS module. The multi-FSR operation utilizes the first FSR (FSR1) to steer the bandwidth between selected node pairs and the zeroth FSR (FSR0) to guarantee a minimum diameter all-to-all topology among the interconnected nodes after reconfiguration. Successful Flex-LIONS design, fabrication, packaging, and system testing demonstrate error-free all-to-all interconnects for both FSR0 and FSR1 with a 5.3-dB power penalty induced by AWGR intra-band crosstalk under the worst-case polarization scenario. After reconfiguration in FSR1, the bandwidth between the selected pair of nodes is increased from 50 to 125 Gb/s while maintaining a 25 Gb/s/λ all-to-all interconnectivity in FSR0

    Chip-to-chip interconnect for 8-socket direct connectivity using 25Gb/s O-band integrated transceiver and routing circuits

    Get PDF
    We present an O-band Chip-to-Chip Interconnect for 8-socket direct connectivity exploiting a Si-based Ring Modulator and a packaged PD-TIA connected over a Si-based 8Ă—8 AWGR routing module. Eight routing scenarios are experimentally demonstrated at 25Gb/s revealing error-free operation

    HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads

    Get PDF
    We propose a new architecture called HTA for high throughput irregular HPC applications with little data reuse. HTA reduces the contention within the memory system with the help of a partitioned memory controller that is amenable for 2.5D implementation using Silicon Photonics. In terms of scalability, HTA supports 4 × higher number of compute units compared to the state-of-the-art GPU systems. Our simulation-based evaluation on a representative set of HPC benchmarks shows that the proposed design reduces the queuing latency by 10% to 30%, and improves the variability in memory access latency by 10% to 60%. Our results show that the HTA improves the L1 miss penalty by 2.3 × to 5 × over GPUs. When compared to a multi-GPU system with the same number of compute units, our simulation results show that the HTA can provide up to 2 × speedup

    Silicon Photonic Flex-LIONS for Bandwidth-Reconfigurable Optical Interconnects

    Get PDF
    This paper reports the first experimental demonstration of silicon photonic (SiPh) Flex-LIONS, a bandwidth-reconfigurable SiPh switching fabric based on wavelength routing in arrayed waveguide grating routers (AWGRs) and space switching. Compared with the state-of-the-art bandwidth-reconfigurable switching fabrics, Flex-LIONS architecture exhibits 21Ă— less number of switching elements and 2.9Ă— lower on-chip loss for 64 ports, which indicates significant improvements in scalability and energy efficiency. System experimental results carried out with an 8-port SiPh Flex-LIONS prototype demonstrate error-free one-to-eight multicast interconnection at 25 Gb/s and bandwidth reconfiguration from 25 Gb/s to 100 Gb/s between selected input and output ports. Besides, benchmarking simulation results show that Flex-LIONS can provide a 1.33Ă— reduction in packet latency and >1.5Ă— improvements in energy efficiency when replacing the core layer switches of Fat-Tree topologies with Flex-LIONS. Finally, we discuss the possibility of scaling Flex-LIONS up to N = 1024 ports (N = M Ă— W) by arranging M^2 W-port Flex-LIONS in a Thin-CLOS architecture using W wavelengths

    A scalable silicon photonic chip-scale optical switch for high performance computing systems

    Get PDF
    This paper discusses the architecture and provides performance studies of a silicon photonic chip-scale optical switch for scalable interconnect network in high performance computing systems. The proposed switch exploits optical wavelength parallelism and wavelength routing characteristics of an Arrayed Waveguide Grating Router (AWGR) to allow contention resolution in the wavelength domain. Simulation results from a cycle-accurate network simulator indicate that, even with only two transmitter/receiver pairs per node, the switch exhibits lower end-to-end latency and higher throughput at high (> 90%) input loads compared with electronic switches. On the device integration level, we propose to integrate all the components (ring modulators, photodetectors and AWGR) on a CMOS-compatible silicon photonic platform to ensure a compact, energy efficient and cost-effective device. We successfully demonstrate proof-of-concept routing functions on an 8 x 8 prototype fabricated using foundry services provided by OpSIS-IME. (C) 2013 Optical Society of Americ

    Experimental Demonstration of Flexible Bandwidth Optical Data Center Core Network With All-to-All Interconnectivity

    Get PDF
    This paper proposes and demonstrates a flexible-bandwidth optical interconnect architecture for data centers exploiting wavelength routing in arrayed waveguide grating routers and fast tunable lasers. The proposed architecture provides hierarchical all-to-all connectivity with low contention and dynamic interconnection reconfiguration for higher bandwidth provisioning between hot spots. An eight-cluster core network experiment testbed with hierarchical all-to-all interconnection shows 1.77x throughput increase and 1.19x network energy efficiency improvement in the case of intercluster hot-spot traffic, while guaranteeing more than 97% throughput for the portion of the traffic with uniform random distribution

    Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics

    Full text link
    The diversity of workload requirements and increasing hardware heterogeneity in emerging high performance computing (HPC) systems motivate resource disaggregation. Resource disaggregation allows compute and memory resources to be allocated individually as required to each workload. However, it is unclear how to efficiently realize this capability and cost-effectively meet the stringent bandwidth and latency requirements of HPC applications. To that end, we describe how modern photonics can be co-designed with modern HPC racks to implement flexible intra-rack resource disaggregation and fully meet the bit error rate (BER) and high escape bandwidth of all chip types in modern HPC racks. Our photonic-based disaggregated rack provides an average application speedup of 11% (46% maximum) for 25 CPU and 61% for 24 GPU benchmarks compared to a similar system that instead uses modern electronic switches for disaggregation. Using observed resource usage from a production system, we estimate that an iso-performance intra-rack disaggregated HPC system using photonics would require 4x fewer memory modules and 2x fewer NICs than a non-disaggregated baseline.Comment: 15 pages, 12 figures, 4 tables. Published in IEEE Cluster 202
    corecore