Abstract: Monolithically integrated dense WDM photonic network topologies optimized for loss and power footprint of optical components can achieve up to 4x better energy-efficiency and throughput than electrical interconnects in core-to-core, and 10x in core-to-DRAM networks.
Introduction
This paper presents a review of recent advances [1, 2] in building high-throughput, energy-efficient photonic networks for core-to-core and core-to-DRAM communication in manycore processor systems. To sustain the performance scaling in these systems, the increase in core count has to be followed by the corresponding increase in energy-efficiency of the core, the interconnect, and bandwidth density [3, 4] . Due to pin-density, wire-bandwidth and power dissipation limits, electrical DRAM interfaces are not expected to supply sufficient bandwidth with reasonable power consumption and packaging cost, and similar issues also limit energy-efficiency and bandwidth density of global on-chip wires. With potential for energy-efficient modulation/detection and dense wavelength division multiplexing (DWM), silicon-photonic interconnect technology is well suited to alleviate the bottleneck, however its application has to be carefully tailored to both the underlying process technology and the desired network topology.
Monolithic CMOS photonic network design
Recently developed infrastructure for photonic chip design and post-fabrication processing methodology [5, 6] enabled for the first time monolithic integration of polysilicon and silicon-based photonic devices in a standard bulk CMOS and thin BOX SOI fabrication flows commonly used for processors. Based on this technology and the tight interaction between design of photonic interconnect components (waveguides, ring-resonators, modulators, photodetectors, waveguide crossings) and network topology, in [1] we have proposed an efficient hybrid electro-optical core-to-DRAM shared memory network (local mesh global switch -LMGS) shown in Fig. 1 , which provides a near ten-fold improvement in throughput compared to optimized electrical networks projected to 22 nm process node and a 256 core processor. To provide a balance between the bandwidth and latency/link utilization, the traffic from several tiles is aggregated via local electrical mesh into a point-to-point dense WDM interconnect with wavelength addressing to a part of a DRAM space. External buffer chip receives the optical signals and arbitrates requests from several core groups to the same DRAM module. This relatively simple interconnect results in significantly reduced number of optical components in the network compared to, for example, high-radix optical crossbar [7] , minimizing the thermal tuning costs as well as losses along the optical path. To relax the loss specifications on integrated photonic devices within a required optical power envelope, Fig. 3a , the physical layout of the network follows a Ushape. Balancing the mesh bandwidth with degree of tile aggregation and optical bandwidth enables efficient utilization of raw energy-efficiency advantage of photonic over electrical interconnect (both across die and die-todie), as shown in Fig. 4 . Similar methodology was used to optimize the core-to-core network to provide a more uniform access for a variety of traffic patterns, by utilizing dense, energy-efficient photonic interconnects to realize otherwise expensive non-blocking Clos network, Fig. 2 . Again, aggregation is used to decrease the radix of the network and balance the electrical and optical power of the network. The Clos photonic layout also follows the Ushape to relax the photonic device loss requirements, Fig. 3b . The Clos achieves significantly better latency and throughput uniformity compared to a concentrated mesh network, Fig. 5 , across a variety of traffic patterns. OWI1.pdf
