51 research outputs found

    Reinventing Integrated Photonic Devices and Circuits for High Performance Communication and Computing Applications

    Get PDF
    The long-standing technological pillars for computing systems evolution, namely Moore\u27s law and Von Neumann architecture, are breaking down under the pressure of meeting the capacity and energy efficiency demands of computing and communication architectures that are designed to process modern data-centric applications related to Artificial Intelligence (AI), Big Data, and Internet-of-Things (IoT). In response, both industry and academia have turned to \u27more-than-Moore\u27 technologies for realizing hardware architectures for communication and computing. Fortunately, Silicon Photonics (SiPh) has emerged as one highly promising ‘more-than-Moore’ technology. Recent progress has enabled SiPh-based interconnects to outperform traditional electrical interconnects, offering advantages like high bandwidth density, near-light speed data transfer, distance-independent bitrate, and low energy consumption. Furthermore, SiPh-based electro-optic (E-O) computing circuits have exhibited up to two orders of magnitude improvements in performance and energy efficiency compared to their electronic counterparts. Thus, SiPh stands out as a compelling solution for creating high-performance and energy-efficient hardware for communication and computing applications. Despite their advantages, SiPh-based interconnects face various design challenges that hamper their reliability, scalability, performance, and energy efficiency. These include limited optical power budget (OPB), high static power dissipation, crosstalk noise, fabrication and on-chip temperature variations, and limited spectral bandwidth for multiplexing. Similarly, SiPh-based E-O computing circuits also face several challenges. Firstly, the E-O circuits for simple logic functions lack the all-electrical input handling, raising hardware area and complexity. Secondly, the E-O arithmetic circuits occupy vast areas (at least 100x) while hardly achieving more than 60% hardware utilization, versus CMOS implementations, leading to high idle times, and non-amortizable area and static power overheads. Thirdly, the high area overhead of E-O circuits hinders them from achieving high spatial parallelism on-chip. This is because the high area overhead limits the count of E-O circuits that can be implemented on a reticle-size limited chip. My research offers significant contributions to address the aforementioned challenges. For SiPh-based interconnects, my contributions focus on enhancing OPB by mitigating crosstalk noise, addressing the optical non-linearity-related issues through the development of Silicon-on-Sapphire-based photonic interconnects, exploring multi-level signaling, and evaluating various device-level design pathways. This enables the design of high throughput (\u3e1Tbps) and energy-efficient (\u3c1pJ/bit) SiPh interconnects. In the context of SiPh-based E-O circuits, my contributions include the design of a microring-based polymorphic E-O logic gate, a hybrid time-amplitude analog optical modulator, and an indium tin oxide-based silicon nitride microring modulator and a weight bank for neural network computations. These designs significantly reduce the area overhead of current E-O computing circuits while enhancing the energy-efficiency, and hardware utilization

    Technology Implications for Large Last-Level Caches

    Get PDF
    Large last-level cache (L3C) is efficient for bridging the performance and power gap between processor and memory. Several memory technologies, including SRAM, STT-RAM (MRAM), and embedded DRAM (eDRAM), have been used or considered as the technology to implement L3Cs. However, each of them has inherent weaknesses: SRAM is relatively low density and dissipates high leakage; STT-RAM has long write latency and requires high write energy; eDRAM requires refresh. As future processors are expected to have larger last-level caches, the objective of this dissertation is to study the tradeoffs associated with using each of these technologies to implement L3Cs. In order to make useful comparisons between L3Cs built with SRAM, STT-RAM, and eDRAM, we consider and implement several levels of details. First, to obtain unbiased cache performance and power properties (i.e., read/write access latency, read/write access energy, leakage power, refresh power, area), we prototype caches based on realistic memory and device models. Second, we present simplistic analytical models that enable us to quickly examine different memory technologies under various scenarios. Third, we review power-optimization techniques for each of the technologies, and propose using a low-cost dead-line prediction scheme for eDRAM-based L3Cs to eliminate unnecessary refreshes. Finally, the highlight of this dissertation is the comparison and analysis of low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. We report system performance, last-level cache energy breakdown, and memory hierarchy energy breakdown, using an augmented full-system simulator with the execution of a range of workloads and input sets. From the insights gained through simulation results, STT-RAM has the highest potential to save energy in future L3C designs. For contemporary processors, SRAM-based L3C results in the fastest system performance, whereas eDRAM consumes the lowest energy

    On-Chip Optical Interconnection Networks for Multi/Manycore Architectures

    Get PDF
    The rapid development of multi/manycore technologies offers the opportunity for highly parallel architectures implemented on a single chip. While the first, low-parallelism multicore products have been based on simple interconnection structures (single bus, very simple crossbar), the emerging highly parallel architectures will require complex, limited-degree interconnection networks. This thesis studies this trend according to the general theory of interconnection structures for parallel machines, and investigates some solutions in terms of performance, cost, fault-tolerance, and run-time support to shared-memory and/or message passing programming mechanisms

    Microarchitectural Low-Power Design Techniques for Embedded Microprocessors

    Get PDF
    With the omnipresence of embedded processing in all forms of electronics today, there is a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. In the context of cryptographic applications, we explore the effectiveness of instruction set extensions (ISEs) for a range of different cryptographic hash functions (SHA-3 candidates) on a 16-bit microcontroller architecture (PIC24). Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi-core architecture targeted at moderate to high processing requirements (> 1 MOPS). A range of different microarchitectural techniques for efficient memory organization are presented. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. We then study an inherent suboptimality due to the worst-case design principle in synchronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for throughput improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 6-stage 32-bit microprocessor with cycle-by-cycle DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of a DCA-enabled OpenRISC microarchitecture fabricated in 28 nm FD-SOI CMOS. The test chip includes a suitable clock generation unit which allows for cycle-by-cycle DCA over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions are provided

    Temperature Evaluation of NoC Architectures and Dynamically Reconfigurable NoC

    Get PDF
    Advancements in the field of chip fabrication led to the integration of a large number of transistors in a small area, giving rise to the multi–core processor era. Massive multi–core processors facilitate innovation and research in the field of healthcare, defense, entertainment, meteorology and many others. Reduction in chip area and increase in the number of on–chip cores is accompanied by power and temperature issues. In high performance multi–core chips, power and heat are predominant constraints. High performance massive multicore systems suffer from thermal hotspots, exacerbating the problem of reliability in deep submicron technologies. High power consumption not only increases the chip temperature but also jeopardizes the integrity of the system. Hence, there is a need to explore holistic power and thermal optimization and management strategies for massive on–chip multi–core environments. In multi–core environments, the communication fabric plays a major role in deciding the efficiency of the system. In multi–core processor chips this communication infrastructure is predominantly a Network–on–Chip (NoC). Tradition NoC designs incorporate planar interconnects as a result these NoCs have long, multi–hop wireline links for data exchange. Due to the presence of multi–hop planar links such NoC architectures fall prey to high latency, significant power dissipation and temperature hotspots. Networks inspired from nature are envisioned as an enabling technology to achieve highly efficient and low power NoC designs. Adopting wireless technology in such architectures enhance their performance. Placement of wireless interconnects (WIs) alters the behavior of the network and hence a random deployment of WIs may not result in a thermally optimal solution. In such scenarios, the WIs being highly efficient would attract high traffic densities resulting in thermal hotspots. Hence, the location and utilization of the wireless links is a key factor in obtaining a thermal optimal highly efficient Network–on–chip. Optimization of the NoC framework alone is incapable of addressing the effects due to the runtime dynamics of the system. Minimal paths solely optimized for performance in the network may lead to excessive utilization of certain NoC components leading to thermal hotspots. Hence, architectural innovation in conjunction with suitable power and thermal management strategies is the key for designing high performance and energy–efficient multicore systems. This work contributes at exploring various wired and wireless NoC architectures that achieve best trade–offs between temperature, performance and energy–efficiency. It further proposes an adaptive routing scheme which factors in the thermal profile of the chip. The proposed routing mechanism dynamically reacts to the thermal profile of the chip and takes measures to avoid thermal hotspots, achieving a thermally efficient dynamically reconfigurable network on chip architecture

    Automated wavelength recovery for silicon photonics

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references.In 2020, 1Tb/s on-/off-chip communication bandwidth and ~100fJ/bit total energy in a point to point link is predicted by Moore's law for high performance computing applications. These requirements are pushing the limits of on-chip silicon CMOS transistors and off-chip VCSELs technology. The major limitation of the current systems is the lack of ability to enable more than a single channel on a single wire/fiber. Silicon photonics, offering a solution on the same platform with CMOS technology, can enable Wavelength Division Multiplexed (WDM) systems. However, Silicon photonics has to overcome the wafer level, fabrication variations and dynamic temperature fluctuations, induced by processor cores with low-energy high-speed resonators. In this work, we offer a solution, called as Automated Wavelength Recovery (AWR), to these limitations. In order to demonstrate AWR, we design and demonstrate high performance active silicon resonators. A microdisk modulator achieved open eye-diagrams at a data rate of 25Gb/s and error-free operation up to 20Gb/s. A thermo-optically tunable microdisk modulator with Low power modulation (1 If/bit) at a data rate of 13-Gb/s, a 5.8-dB extinction ratio, a 1.22-dB insertion loss and a record-low thermal tuning (4.9-[mu].W/GHz) of a high-speed modulator is achieved. We demonstrated a new L-shaped resonant microring (LRM) modulator that achieves 30 Gb/s error-free operation in a compact (< 20 [mu]m²) structure while maintaining single-mode operation, enabling direct WDM across an uncorrupted 5.3 THz FSR. We have introduced heater elements inside a new single mode filter, a LRM filter, successfully. The LRM filter achieved high-efficiency (3.3[mu]W/GHz) and high-speed ([tau]f ~1.6 [mu]s) thermal tuning and maintained signal integrity with record low thru to drop power penalty (<1.1 dB) over the 4 THz FSR and <0.5dB insertion loss. We have integrated a heater driver and adiabatic resonant microring (ARM) filter in a commercial bulk CMOS deep-trench process for the first time. The proposed AWR algorithm is implemented with an ARM multiplexer. An advanced method for AWR is also introduced and demonstrated with passive resonators.by Erman Timurdogan.S.M

    Dynamic Voltage and Frequency Scaling for Wireless Network-on-Chip

    Get PDF
    Previously, research and design of Network-on-Chip (NoC) paradigms where mainly focused on improving the performance of the interconnection networks. With emerging wide range of low-power applications and energy constrained high-performance applications, it is highly desirable to have NoCs that are highly energy efficient without incurring performance penalty. In the design of high-performance massive multi-core chips, power and heat have become dominant constrains. Increased power consumption can raise chip temperature, which in turn can decrease chip reliability and performance and increase cooling costs. It was proven that Small-world Wireless Network-on-Chip (SWNoC) architecture which replaces multi-hop wire-line path in a NoC by high-bandwidth single hop long range wireless links, reduces the overall energy dissipation when compared to wire-line mesh-based NoC architecture. However, the overall energy dissipation of the wireless NoC is still dominated by wire-line links and switches (buffers). Dynamic Voltage Scaling is an efficient technique for significant power savings in microprocessors. It has been proposed and deployed in modern microprocessors by exploiting the variance in processor utilization. On a Network-on-Chip paradigm, it is more likely that the wire-line links and buffers are not always fully utilized even for different applications. Hence, by exploiting these characteristics of the links and buffers over different traffic, DVFS technique can be incorporated on these switches and wire-line links for huge power savings. In this thesis, a history based DVFS mechanism is proposed. This mechanism uses the past utilization of the wire-line links & buffers to predict the future traffic and accordingly tune the voltage and frequency for the links and buffers dynamically for each time window. This mechanism dynamically minimizes the power consumption while substantially maintaining a high performance over the system. Performance analysis on these DVFS enabled Wireless NoC shows that, the overall energy dissipation is improved by around 40% when compared Small-world Wireless NoCs

    Cache memory design in the FinFET era

    Get PDF
    The major problem in the future technology scaling is the variations in process parameters that are interpreted as imperfections in the development process. Moreover, devices are more sensitive to the environmental changes of temperature and supply volt- age as well as to ageing. All these influences are manifested in the integrated circuits as increased power consumption, reduced maximal operating frequency and increased number of failures. These effects have been partially overcome with the introduction of the FinFET technology which have solved the problem of variability caused by Random Dopant Fluctuations. However, in the next ten years channel length is projected to shrink to 10nm where the variability source generated by Line Edge Roughness will dominate, and its effects on the threshold voltage variations will become critical. The embedded memories with their cells as the basic building unit are the most prone to these effects due to their the smallest dimensions. Because of that, memories should be designed with particular care in order to make possible further technology scaling. This thesis explores upcoming 10nm FinFETs and the existing issues in the cache memory design with this technology. More- over, it tries to present some original and novel techniques on the different level of design abstraction for mitigating the effects of process and environmental variability. At first original method for simulating variability of Tri-Gate Fin- FETs is presented using conventional HSPICE simulation environment and BSIM-CMG model cards. When that is accomplished, thorough characterisation of traditional SRAM cell circuits (6T and 8T) is performed. Possibility of using Independent Gate FinFETs for increasing cell stability has been explored, also. Gain Cells appeared in the recent past as an attractive alternative for in the cache memory design. This thesis partially explores this idea by presenting and performing detailed circuit analysis of the dynamic 3T gain cell for 10nm FinFETs. At the top of this work, thesis shows one micro-architecture optimisation of high-speed cache when it is implemented by 3T gain cells. We show how the cache coherency states can be used in order to reduce refresh energy of the memory as well as reduce memory ageing.El principal problema de l'escalat la tecnologia són les variacions en els paràmetres de disseny (imperfeccions) durant procés de fabricació. D'altra banda, els dispositius també són més sensibles als canvis ambientals de temperatura, la tensió d'alimentació, així com l'envelliment. Totes aquestes influències es manifesten en els circuits integrats com l'augment de consum d'energia, la reducció de la freqüència d'operació màxima i l'augment del nombre de xips descartats. Aquests efectes s'han superat parcialment amb la introducció de la tecnologia FinFET que ha resolt el problema de la variabilitat causada per les fluctuacions de dopants aleatòries. No obstant això, en els propers deu anys, l'ample del canal es preveu que es reduirà a 10nm, on la font de la variabilitat generada per les rugositats de les línies de material dominarà, i els seu efecte en les variacions de voltatge llindar augmentarà. Les memòries encastades amb les seves cel·les com la unitat bàsica de construcció són les més propenses a sofrir aquests efectes a causa de les seves dimensions més petites. A causa d'això, cal dissenyar les memòries amb una especial cura per tal de fer possible l'escalat de la tecnologia. Aquesta tesi explora la tecnologia de FinFETs de 10nm i els problemes existents en el disseny de memòries amb aquesta tecnologia. A més a més, presentem noves tècniques originals sobre diferents nivells d'abstracció del disseny per a la mitigació dels efectes les variacions tan de procés com ambientals. En primer lloc, presentem un mètode original per a la simulació de la variabilitat de Tri-Gate FinFETs usant entorn de simulació HSPICE convencional i models de tecnologia BSIMCMG. Després, es realitza la caracterització completa dels circuits de cel·les SRAM tradicionals (6T i 8T) conjuntament amb l'ús de Gate-independent FinFETs per augmentar l'estabilitat de la cèl·lula
    • …
    corecore