4 research outputs found

    On-Chip Optical Interconnection Networks for Multi/Manycore Architectures

    Get PDF
    The rapid development of multi/manycore technologies offers the opportunity for highly parallel architectures implemented on a single chip. While the first, low-parallelism multicore products have been based on simple interconnection structures (single bus, very simple crossbar), the emerging highly parallel architectures will require complex, limited-degree interconnection networks. This thesis studies this trend according to the general theory of interconnection structures for parallel machines, and investigates some solutions in terms of performance, cost, fault-tolerance, and run-time support to shared-memory and/or message passing programming mechanisms

    Exploiting Properties of CMP Cache Traffic in Designing Hybrid Packet/Circuit Switched NoCs

    Get PDF
    Chip multiprocessors with few to tens of processing cores are already commercially available. Increased scaling of technology is making it feasible to integrate even more cores on a single chip. Providing the cores with fast access to data is vital to overall system performance. When a core requires access to a piece of data, the core's private cache memory is searched first. If a miss occurs, the data is looked up in the next level(s) of the memory hierarchy, where often one or more levels of cache are shared between two or more cores. Communication between the cores and the slices of the on-chip shared cache is carried through the network-on-chip(NoC). Interestingly, the cache and NoC mutually affect the operation of each other; communication over the NoC affects the access latency of cache data, while the cache organization generates the coherence and data messages, thus affecting the communication patterns and latency over the NoC. This thesis considers hybrid packet/circuit switched NoCs, i.e., packet switched NoCs enhanced with the ability to configure circuits. The communication and performance benefit that come from using circuits is predicated on amortizing the time cost incurred for configuring the circuits. To address this challenge, NoC designs are proposed that take advantage of properties of the cache traffic, namely temporal locality and predictability, to amortize or hide the circuit configuration time cost. First, a coarse-grained circuit configuration policy is proposed that exploits the temporal locality in the cache traffic to periodically configure circuits for the heavily communicating nodes. This allows the design of a locality-aware cache that promotes temporal communication locality through data placement, while designing suitable data replacement and migration policies. Next, a fine-grained configuration policy, called Déjà Vu switching, is proposed for leveraging predictability of data messages by initiating a circuit configuration as soon as a cache hit is detected and before the data becomes available. Its benefit is demonstrated for saving interconnect energy in multi-plane NoCs. Finally, a more proactive configuration policy is proposed for fast caches, where circuit reservations are initiated by request messages, which can greatly improve communication latency and system performance

    Design of Optical Interconnect Transceiver Circuits and Network-on-chip Architectures for Inter- and Intra-chip Communication

    Get PDF
    The rapid expansion in data communication due to the increased multimedia applications and cloud computing services necessitates improvements in optical transceiver circuitry power efficiency as these systems scale well past 10 Gb/s. In order to meet these requirements, a 26 GHz transimpedance amplifier (TIA) is presented in a 0.25-µm SiGe BiCMOS technology. It employs a transformer-based regulated cascode (RGC) input stage which provides passive negative-feedback gain that enhances the effective transconductance of the TIA’s input common-base transistor; reducing the input resistance and pro- viding considerable bandwidth extension without significant noise degradation or power consumption. The TIA achieves a 53 dBΩ single-ended transimpedance gain with a 26√ GHz bandwidth and 21.3 pA/H z average input-referred noise current spectral density. Total chip power including output buffering is 28.2 mW from a 2.5 V supply, with the core TIA consuming 8.2 mW, and the chip area including pads is 960 µm × 780 µm. With the advance of photonic devices, optical interconnects becomes a promising technology to replace the conventional electrical channels for the high-bandwidth and power efficient inter/intra-chip interconnect. Second, a silicon photonic transceiver is presented for a silicon ring resonator-based optical interconnect architecture in a 1V standard 65nm CMOS technology. The transmitter circuits incorporate high-swing drivers with non-linear pre-emphasis and automatic bias-based tuning for resonance wavelength stabilization. An optical forwarded-clock adaptive inverter-based transimpedance amplifier (TIA) receiver trades-off power for varying link budgets by employing an on-die eye monitor and scaling the TIA supply for the required sensitivity. At 5 GB/s operation, the ring modulator un- der 4Vpp driver achieves 12.7dB extinction ratio with 4.04mW power consumption, while a 0.28nm tuning range is obtained at 6.8µW/GHz efficiency with the bias-based tuning scheme implemented with the 2Vpp transmitter. When tested with a wire-bonded 150f- F p-i-n photodetector, the receiver achieves -12.7dBm sensitivity at a BER=10−15 and consumes 2.2mW at 8 GB/s. Third, a novel Nano-Photonic Network-on-Chip (NoC) architecture, called LumiNoC, is proposed for high performance and power-efficient interconnects for the chip-multi- processors (CMPs). A 64-node LumiNoC under synthetic traffic enjoys 50% less latency at low loads versus other reported photonic NoCs, and ∼25% less latency versus the electrical 2D mesh NoCs on realistic workloads. Under the same ideal throughput, LumiNoC achieves laser power reduction of 78%, and overall power reduction of 44% versus competing designs

    Energy-efficient electrical and silicon-photonic networks in many core systems

    Full text link
    Thesis (Ph.D.)--Boston UniversityDuring the past decade, the very large scale integration (VLSI) community has migrated towards incorporating multiple cores on a single chip to sustain the historic performance improvement in computing systems. As the core count continuously increases, the performance of network-on-chip (NoC), which is responsible for the communication between cores, caches and memory controllers, is increasingly becoming critical for sustaining the performance improvement. In this dissertation, we propose several methods to improve the energy efficiency of both electrical and silicon-photonic NoCs. Firstly, for electrical NoC, we propose a flow control technique, Express Virtual Channel with Taps (EVC-T), to transmit both broadcast and data packets efficiently in a mesh network. A low-latency notification tree network is included to maintain t he order of broadcast packets. The EVC-T technique improves the NoC latency by 24% and the system energy efficiency in terms of energy-delay product (EDP) by 13%. In the near future, the silicon-photonic links are projected to replace the electrical links for global on-chip communication due to their lower data-dependent power and higher bandwidth density, but the high laser power can more than offset these advantages. Therefore, we propose a silicon-photonic multi-bus NoC architecture and a methodology that can reduce the laser power by 49% on average through bandwidth reconfiguration at runtime based on the variations in bandwidth requirements of applications. We also propose a technique to reduce the laser power by dynamically activating/deactivating the 12 cache banks and switching ON/ OFF the corresponding silicon-photonic links in a crossbar NoC. This cache-reconfiguration based technique can save laser power by 23.8% and improves system EDP by 5.52% on average. In addition, we propose a methodology for placing and sharing on-chip laser sources by jointly considering the bandwidth requirements, thermal constraints and physical layout constraints. Our proposed methodology for placing and sharing of on-chip laser sources reduces laser power. In addition to reducing the laser power to improve the energy efficiency of silicon-photonic NoCs, we propose to leverage the large bandwidth provided by silicon-photonic NoC to share computing resources. The global sharing of floating-point units can save system area by 13.75% and system power by 10%
    corecore