115 research outputs found

    Photonic Interconnects Beyond High Bandwidth

    Get PDF
    The extraordinary growth of parallelism in high-performance computing requires efficient data communication for scaling compute performance. High-performance computing systems have been using photonic links for communication of large bandwidth-distance product during the last decade. Photonic interconnection networks, however, should not be a wire-for-wire replacement based on conventional electrical counterparts. Features of photonics beyond high bandwidth, such as transparent bandwidth steering, can implement important functionalities needed by applications. In another aspect, application characteristics can be exploited to design better photonic interconnects. Therefore, this thesis explores codesign opportunities at the intersection between photonic interconnect architectures and high-performance computing applications. The key accomplishments of this thesis, ranging from system level to node level, are as follows. Chapter 2 presents a system-level architecture that leverages photonic switching to enable a reconfigurable interconnect. The architecture, called Flexfly, reconfigures the inter-group level of the widely-used Dragonfly topology using information about the application’s communication pattern. It can steal additional direct bandwidth for communication-intensive group pairs. Simulations with applications such as GTC, Nekbone and LULESH show up to 1.8x speedup over Dragonfly paired with UGAL routing, along with halved hop count and latency for cross-group messages. To demonstrate the effectiveness of our approach, we built a 32-node Flexfly prototype using a silicon photonic switch connecting four groups and demonstrated 820 ns interconnect reconfiguration time. This is the first demonstration of silicon photonic switching and bandwidth steering in a high-performance computing cluster. Chapter 3 extends photonic switching to the node level and presents a reconfigurable silicon photonic memory interconnect for many-core architectures. The interconnect targets at important memory access issues, such as network-on-chip hot-spots and non-uniform memory access. Integrated with the processor through 2.5D/3D stacking, a fast-tunable silicon photonic memory tunnel can transparently direct traffic from any off-chip memory to any on-chip interface – thus alleviating the hot-spot and non-uniform access effects. We demonstrated the operation of our proposed architecture using a tunable laser, a 4-port silicon photonic switch (four wavelength-routed memory channels) and a 4x4 mesh network-on-chip synthesized by FPGA. The emulated system achieves a 15-ns channel switching time. Simulations based on a 12-core 4-memory model show that for such switching speeds the interconnect system can realize a 2x speedup for the STREAM benchmark in the hot-spot scenario and a reduction of execution time for data-intensive applications such as 3D stencil and K-means clustering by 23% and 17%, respectively. Chapters 4 explores application-level characteristics that can be exploited to hide photonic path setup delays. In view of the frequent reuse of optical circuits by many applications, we proposed a circuit-cached scheme that amortizes the setup overhead by maximizing circuit reuses. In order to improve circuit “hit” rates, we developed a reuse-distance based replacement policy called “Farthest Next Use”. We further investigated the tradeoffs between the realized hit rate and energy consumption. Finally, we experimentally demonstrated the feasibility of the proposed concept using silicon photonic devices in an FPGA-controlled network testbed. Chapter 5 proceeds to develop an application-guided circuit-prefetch scheme. By learning temporal locality and communication patterns from upper-layer applications, the scheme not only caches a set of circuits for reuses, but also proactively prefetches circuits based on predictions. We applied this technique to communication patterns from a spectrum of science and engineering applications. The results show that setup delays via circuit misses are significantly reduced, showing how the proposed technique can improve circuit switching in photonic interconnects

    Hardware-Software Integrated Silicon Photonic Systems

    Get PDF
    Fabrication of integrated photonic devices and circuits in a CMOS-compatible process or foundry is the essence of the silicon photonic platform. Optical devices in this platform are enabled by the high index contrast between silicon and silicon on insulator. These devices offer potential benefits when integrated with existing and emerging high performance microelectronics. Integration of silicon photonics with small footprints and power-efficient and high-bandwidth operation has long been cited as a solution to existing issues in high performance interconnects for telecommunications and data communication. Stemming from this historic application in communications, new applications in sensing arrays, biochemistry, and even entertainment continue to grow. However, for many technologies to successfully adopt silicon photonics and reap the perceived benefits, the silicon photonic platform must extend toward development of a full ecosystem. Such extension includes implementation of low cost and robust electronic-photonic packaging techniques for all applications. In an ecosystem implemented with services ranging from device fabrication all the way to packaged products, ease-of-use and ease-of-deployment in systems that require many hardware and software components becomes possible. With the onset of the Internet of Things (IoT), nearly all technologies—sensors, compute, communication devices, etc.—persist in systems with some level of localized or distributed software interaction. These interactions often require a level of networked communications. For silicon photonics to penetrate technologies comprising IoT, it is advantageous to implement such devices in a hardware-software integrated way. Meaning, all functionalities and interactions related to the silicon photonic devices are well defined in terms of the physicality of the hardware. This hardware is then abstracted into various levels of software as needed in the system. The power of hardware-software integration allows many of the piece-wise demonstrated functionalities of silicon photonics to easily translate to commercial implementation. This work begins by briefly highlighting the challenges and solutions for transforming existing silicon photonic platforms to a full-fledged silicon photonic ecosystem. The highlighted solutions in development consist of tools for fabrication, testing, subsystem packaging, and system validation. Building off the knowledge of a silicon photonic ecosystem in development, this work continues by demonstrating various levels of hardware-software integration. These are primarily focused on silicon photonic interconnects. The first hardware-software integration-focused portion of this work explores silicon microring-based devices as a key building block for greater silicon photonic subsystems. The microring’s sensitivity to thermal fluctuations is identified not as a flaw, but as a tool for functionalization. A logical control system is implemented to mitigate thermal effects that would normally render a microring resonator inoperable. The mechanism to control the microring is extended and abstracted with software programmability to offer wavelength routing as a network primitive. This functionality, available through hardware-software integration, offers the possibility for ubiquitous deployment of such microring devices in future photonic interconnection networks. The second hardware-software integration-focused portion of this work explores dynamic silicon photonic switching devices and circuits. Specifically, interactions with and implications of high-speed data propagation and link layer control are demonstrated. The characteristics of photonic link setup include transients due to physical layer optical effects, latencies involved with initializing burst mode links, and optical link quality. The impacts on the functionalities and performance offered by photonic devices are explored. An optical network interface platform is devised using FPGAs to encapsulate hardware and software for controlling these characteristics using custom hardware description language, firmware, and software. A basic version of a silicon photonic network controller using FPGAs is used as a tool to demonstrate a highly scalable switch architecture using microring resonators. This architecture would not be possible without some semblance of this controller, combined with advanced electronic-photonic packaging. A more advanced deployment of the network interface platform is used to demonstrate a method for accelerating photonic links using out-of-band arbitration. A first demonstration of this platform is performed on a silicon photonic microring router network. A second demonstration is used to further explore the feasibility of full hardware-software integrated photonic device actuation, link layer control, and out-of-band arbitration. The demonstration is performed on a complete silicon photonic network with both spatial switching and wavelength routing functionalities. The aforementioned hardware-software integration mechanisms are rigorously tested for data communications applications. Capabilities are shown for very reliable, low latency, and dynamic high-speed data delivery using silicon photonic devices. Applying these mechanisms to complete electronic-photonic packaged subsystems provides a strong path to commercial manifestations of functional silicon photonic devices

    Design space exploration of photonic interconnects

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 109-113).As processors scale deep into the multi-core and many-core regimes, bandwidth and energy-efficiency of the on-die interconnect network have become paramount design issues. Recognizing potential limits of electrical interconnects, emerging nanophotonic integration has been recently proposed as a potential technology option for both on-chip and chip-to-chip applications. As optical links avoid the capacitive, resistive and signal integrity limits imposed upon electrical interconnects, the introduction of integrated photonics allows for efficient realization of physical connectivity that are costly to accomplish electrically. While many recent works have since cited the potential benefits of optics, inherent design tradeoffs of photonic datapath and backend components remain relatively unknown at the system-level. This thesis develops insights regarding the behavior of electrical and hybrid optoelectrical networks and systems. We present power and area models that capture the behavior of electrical interface circuits and their interactions with optical devices. To animate these models in the context of a full system, we contribute DSENT, a novel physical modeling framework capable of estimating the costs of generalized digital electronics, mixed-signal interface circuitry, and optical links. With DSENT, we enable fast power and area evaluation of entire networks to connect the dynamics of an underlying photonics interconnect to that of an otherwise electrical system. Using our methodolody, we perform a technology-driven design space exploration of intra-chip networks and highlight the importance of thermal tuning and parasitic receiver capacitances in network power consumption. We show that the performance gains enabled by photonics-inspired architectures can enable savings in total system energy even if the network is more costly. Finally, we propose a photonically interconnected DRAM system as a solution to the core-to-DRAM bandwidth bottleneck. By attacking energy consumption at the DRAM channel, chip, and bank level with integrated photoncis, we cut the power consumption of the DRAM system by 10x while remaining area neutral when compared to a projected electrical baseline.by Chen Sun.S.M

    The effect of an optical network on-chip on the performance of chip multiprocessors

    Get PDF
    Optical networks on-chip (ONoC) have been proposed to reduce power consumption and increase bandwidth density in high performance chip multiprocessors (CMP), compared to electrical NoCs. However, as buffering in an ONoC is not viable, the end-to-end message path needs to be acquired in advance during which the message is buffered at the network ingress. This waiting latency is therefore a combination of path setup latency and contention and forms a significant part of the total message latency. Many proposed ONoCs, such as Single Writer, Multiple Reader (SWMR), avoid path setup latency at the expense of increased optical components. In contrast, this thesis investigates a simple circuit-switched ONoC with lower component count where nodes need to request a channel before transmission. To hide the path setup latency, a coherence-based message predictor is proposed, to setup circuits before message arrival. Firstly, the effect of latency and bandwidth on application performance is thoroughly investigated using full-system simulations of shared memory CMPs. It is shown that the latency of an ideal NoC affects the CMP performance more than the NoC bandwidth. Increasing the number of wavelengths per channel decreases the serialisation latency and improves the performance of both ONoC types. With 2 or more wavelengths modulating at 25 Gbit=s , the ONoCs will outperform a conventional electrical mesh (maximal speedup of 20%). The SWMR ONoC outperforms the circuit-switched ONoC. Next coherence-based prediction techniques are proposed to reduce the waiting latency. The ideal coherence-based predictor reduces the waiting latency by 42%. A more streamlined predictor (smaller than a L1 cache) reduces the waiting latency by 31%. Without prediction, the message latency in the circuit-switched ONoC is 11% larger than in the SWMR ONoC. Applying the realistic predictor reverses this: the message latency in the SWMR ONoC is now 18% larger than the predictive circuitswitched ONoC

    Network-on-Chip

    Get PDF
    Limitations of bus-based interconnections related to scalability, latency, bandwidth, and power consumption for supporting the related huge number of on-chip resources result in a communication bottleneck. These challenges can be efficiently addressed with the implementation of a network-on-chip (NoC) system. This book gives a detailed analysis of various on-chip communication architectures and covers different areas of NoCs such as potentials, architecture, technical challenges, optimization, design explorations, and research directions. In addition, it discusses current and future trends that could make an impactful and meaningful contribution to the research and design of on-chip communications and NoC systems

    Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures

    Get PDF
    Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable

    Exploiting Properties of CMP Cache Traffic in Designing Hybrid Packet/Circuit Switched NoCs

    Get PDF
    Chip multiprocessors with few to tens of processing cores are already commercially available. Increased scaling of technology is making it feasible to integrate even more cores on a single chip. Providing the cores with fast access to data is vital to overall system performance. When a core requires access to a piece of data, the core's private cache memory is searched first. If a miss occurs, the data is looked up in the next level(s) of the memory hierarchy, where often one or more levels of cache are shared between two or more cores. Communication between the cores and the slices of the on-chip shared cache is carried through the network-on-chip(NoC). Interestingly, the cache and NoC mutually affect the operation of each other; communication over the NoC affects the access latency of cache data, while the cache organization generates the coherence and data messages, thus affecting the communication patterns and latency over the NoC. This thesis considers hybrid packet/circuit switched NoCs, i.e., packet switched NoCs enhanced with the ability to configure circuits. The communication and performance benefit that come from using circuits is predicated on amortizing the time cost incurred for configuring the circuits. To address this challenge, NoC designs are proposed that take advantage of properties of the cache traffic, namely temporal locality and predictability, to amortize or hide the circuit configuration time cost. First, a coarse-grained circuit configuration policy is proposed that exploits the temporal locality in the cache traffic to periodically configure circuits for the heavily communicating nodes. This allows the design of a locality-aware cache that promotes temporal communication locality through data placement, while designing suitable data replacement and migration policies. Next, a fine-grained configuration policy, called DĂ©jĂ  Vu switching, is proposed for leveraging predictability of data messages by initiating a circuit configuration as soon as a cache hit is detected and before the data becomes available. Its benefit is demonstrated for saving interconnect energy in multi-plane NoCs. Finally, a more proactive configuration policy is proposed for fast caches, where circuit reservations are initiated by request messages, which can greatly improve communication latency and system performance

    Process based cost modeling of emerging optoelectronic interconnects : implications for material platform choice

    Get PDF
    Thesis (S.M. in Technology and Policy)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2008.Includes bibliographical references (p. 93-96).Continuously increasing demand for processing power, storage capacity, and I/O capacity in personal computing, data network, and display interface suggests that optical interconnects may soon supplant copper not only for long distance telecommunication but also for short reach connection needs. In the search for a standard, the current debate in the optoelectronic industry is focused on the technical and economic challenges of the next generation interconnect. Technological advances over the past few years have given new strength to a silicon-technology platform for optoelectronics. The possibility of extending a mature and high-yield Si CMOS manufacturing platform of the electronic industry into the optical domain is an area of intensive interest. Introducing new photonic materials and processes into the mature electronic industry involves a convergence of knowledge between the optoelectronics and semiconductor IC manufacturers. To address some of the technical, market, and organizational uncertainties with the Si platform, this research explores the economic viability and operational hurdles of manufacturing a 1310 nm, 100G Ethernet LAN transceiver. This analysis is carried out using the process-based cost modeling method. Four transceiver designs ranging from the most discrete to a high level of integration are considered on both InP and Si platforms. On the macro-level, this research also explores possible electronic-photonic convergence across industries through a multi-organization, exploratory roadmapping effort. Results have shown 1) integration provides a cost advantage within each material platform.(cont.) This economic competitiveness is due to cost savings associated with the elimination of discrete components and assembly steps; 2) a total cost comparison across material platforms indicates at low volume (less than 1.1 million annual units), the InP material platform is preferred, while at high volume (greater than 3 million annual units) the Si material platform is preferred. Furthermore, this study maps out the production cost at each technology and volume projection, and then compares this cost with price expectation to determine the viability of the transceiver market in the datacom and computing industry. Results indicate that annual production volumes must be in the tens of millions unit range to provide the minimum economies of scale necessary for designs to meet the trigger price. These results highlight that standards and a set of common language are essential to enable converging technology markets.by Shan Liu.S.M
    • …
    corecore