221 research outputs found
Resource and thermal management in 3D-stacked multi-/many-core systems
Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures.
3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead.
This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing.
This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory.
At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads.
On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00
Radiation Hardened by Design Methodologies for Soft-Error Mitigated Digital Architectures
abstract: Digital architectures for data encryption, processing, clock synthesis, data transfer, etc. are susceptible to radiation induced soft errors due to charge collection in complementary metal oxide semiconductor (CMOS) integrated circuits (ICs). Radiation hardening by design (RHBD) techniques such as double modular redundancy (DMR) and triple modular redundancy (TMR) are used for error detection and correction respectively in such architectures. Multiple node charge collection (MNCC) causes domain crossing errors (DCE) which can render the redundancy ineffectual. This dissertation describes techniques to ensure DCE mitigation with statistical confidence for various designs. Both sequential and combinatorial logic are separated using these custom and computer aided design (CAD) methodologies.
Radiation vulnerability and design overhead are studied on VLSI sub-systems including an advanced encryption standard (AES) which is DCE mitigated using module level coarse separation on a 90-nm process with 99.999% DCE mitigation. A radiation hardened microprocessor (HERMES2) is implemented in both 90-nm and 55-nm technologies with an interleaved separation methodology with 99.99% DCE mitigation while achieving 4.9% increased cell density, 28.5 % reduced routing and 5.6% reduced power dissipation over the module fences implementation. A DMR register-file (RF) is implemented in 55 nm process and used in the HERMES2 microprocessor. The RF array custom design and the decoders APR designed are explored with a focus on design cycle time. Quality of results (QOR) is studied from power, performance, area and reliability (PPAR) perspective to ascertain the improvement over other design techniques.
A radiation hardened all-digital multiplying pulsed digital delay line (DDL) is designed for double data rate (DDR2/3) applications for data eye centering during high speed off-chip data transfer. The effect of noise, radiation particle strikes and statistical variation on the designed DDL are studied in detail. The design achieves the best in class 22.4 ps peak-to-peak jitter, 100-850 MHz range at 14 pJ/cycle energy consumption. Vulnerability of the non-hardened design is characterized and portions of the redundant DDL are separated in custom and auto-place and route (APR). Thus, a range of designs for mission critical applications are implemented using methodologies proposed in this work and their potential PPAR benefits explored in detail.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
Constraint-Aware, Scalable, and Efficient Algorithms for Multi-Chip Power Module Layout Optimization
Moving towards an electrified world requires ultra high-density power converters. Electric vehicles, electrified aerospace, data centers, etc. are just a few fields among wide application areas of power electronic systems, where high-density power converters are essential. As a critical part of these power converters, power semiconductor modules and their layout optimization has been identified as a crucial step in achieving the maximum performance and density for wide bandgap technologies (i.e., GaN and SiC). New packaging technologies are also introduced to produce reliable and efficient multichip power module (MCPM) designs to push the current limits. The complexity of the emerging MCPM layouts is surpassing the capability of a manual, iterative design process to produce an optimum design with agile development requirements. An electronic design automation tool called PowerSynth has been introduced with ongoing research toward enhanced capabilities to speed up the optimized MCPM layout design process. This dissertation presents the PowerSynth progression timeline with the methodology updates and corresponding critical results compared to v1.1. The first released version (v1.1) of PowerSynth demonstrated the benefits of layout abstraction, and reduced-order modeling techniques to perform rapid optimization of the MCPM module compared to the traditional, manual, and iterative design approach. However, that version is limited by several key factors: layout representation technique, layout generation algorithms, iterative design-rule-checking (DRC), optimization algorithm candidates, etc. To address these limitations, and enhance PowerSynth’s capabilities, constraint-aware, scalable, and efficient algorithms have been developed and implemented. PowerSynth layout engine has evolved from v1.3 to v2.0 throughout the last five years to incorporate the algorithm updates and generate all 2D/2.5D/3D Manhattan layout solutions. These fundamental changes in the layout generation methodology have also called for updates in the performance modeling techniques and enabled exploring different optimization algorithms. The latest PowerSynth 2 architecture has been implemented to enable electro-thermo-mechanical and reliability optimization on 2D/2.5D/3D MCPM layouts, and set up a path toward cabinet-level optimization. PowerSynth v2.0 computer-aided design (CAD) flow has been hardware-validated through manufacturing and testing of an optimized novel 3D MCPM layout. The flow has shown significant speedup compared to the manual design flow with a comparable optimization result
Architectural-Physical Co-Design of 3D CPUs with Micro-Fluidic Cooling
The performance, energy efficiency and cost improvements due to traditional technology scaling have begun to slow down and present diminishing returns. Underlying reasons for this trend include fundamental physical limits of transistor scaling, the growing significance of quantum effects as transistors shrink, and a growing mismatch between transistors and interconnects regarding size, speed and power. Continued Moore's Law scaling will not come from technology scaling alone, and must involve improvements to design tools and development of new disruptive technologies such as 3D integration. 3D integration presents potential improvements to interconnect power and delay by translating the routing problem into a third dimension, and facilitates transistor density scaling independent of technology node.
Furthermore, 3D IC technology opens up a new architectural design space of heterogeneously-integrated high-bandwidth CPUs. Vertical integration promises to provide the CPU architectures of the future by integrating high performance processors with on-chip high-bandwidth memory systems and highly connected network-on-chip structures. Such techniques can overcome the well-known CPU performance bottlenecks referred to as memory and communication wall.
However the promising improvements to performance and energy efficiency offered by 3D CPUs does not come without cost, both in the financial investments to develop the technology, and the increased complexity of design. Two main limitations to 3D IC technology have been heat removal and TSV reliability. Transistor stacking creates increases in power density, current density and thermal resistance in air cooled packages. Furthermore the technology introduces vertical through silicon vias (TSVs) that create new points of failure in the chip and require development of new BEOL technologies. Although these issues can be controlled to some extent using thermal-reliability aware physical and architectural 3D design techniques, high performance embedded cooling schemes, such as micro-fluidic (MF) cooling, are fundamentally necessary to unlock the true potential of 3D ICs.
A new paradigm is being put forth which integrates the computational, electrical, physical, thermal and reliability views of a system. The unification of these diverse aspects of integrated circuits is called Co-Design. Independent design and optimization of each aspect leads to sub-optimal designs due to a lack of understanding of cross-domain interactions and their impacts on the feasibility region of the architectural design space. Co-Design enables optimization across layers with a multi-domain view and thus unlocks new high-performance and energy efficient configurations. Although the co-design paradigm is becoming increasingly necessary in all fields of IC design, it is even more critical in 3D ICs where, as we show, the inter-layer coupling and higher degree of connectivity between components exacerbates the interdependence between architectural parameters, physical design parameters and the multitude of metrics of interest to the designer (i.e. power, performance, temperature and reliability). In this dissertation we present a framework for multi-domain co-simulation and co-optimization of 3D CPU architectures with both air and MF cooling solutions. Finally we propose an approach for design space exploration and modeling within the new Co-Design paradigm, and discuss the possible avenues for improvement of this work in the future
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
Leveraging 3D technology for improved reliability
Journal ArticleAggressive technology scaling over the years has helped improve processor performance but has caused a reduction in processor reliability. Shrinking transistor sizes and lower supply voltages have increased the vulnerability of computer systems towards transient faults. An increase in within-die and die-to-die parameter variations has also led to a greater number of dynamic timing errors. A potential solution to mitigate the impact of such errors is redundancy via an in-order checker processor. Emerging 3D performance as well as reduced power consumption because of shorter on-chip wires. In this paper, we leverage the "snap-on" functionality provided by 3D integration and propose implementing the redundant checker processor on a second die. This allows manufacturers to easily create a family of "reliable processors" without significantly impacting the cost or performance for customers that care less about reliability. We comprehensively evaluate design choices for this second die, including the effects of L2 cache organization, deep pipelining, and frequency. An interesting feature made possible by 3D integration is the incorporation of heterogeneous process technologies within a single chip
HeurĂsticas bioinspiradas para el problema de Floorplanning 3D tĂ©rmico de dispositivos MPSoCs
Tesis inĂ©dita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leĂda el 20-06-2013Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu
Méthodologies de conception ASIC pour des systèmes sur puce 3D hétérogènes à base de réseaux sur puce 3D
Dans cette thèse, nous étudions les architectures 3D NoC grâce à des implémentations de conception physiques en utilisant la technologie 3D réel mis en oeuvre dans l'industrie. Sur la base des listes d'interconnexions en déroute, nous procédons à l'analyse des performances d'évaluer le bénéfice de l'architecture 3D par rapport à sa mise en oeuvre 2D. Sur la base du flot de conception 3D proposé en se concentrant sur la vérification temporelle tirant parti de l'avantage du retard négligeable de la structure de microbilles pour les connexions verticales, nous avons mené techniques de partitionnement de NoC 3D basé sur l'architecture MPSoC y compris empilement homogène et hétérogène en utilisant Tezzaron 3D IC technlogy. Conception et mise en oeuvre de compromis dans les deux méthodes de partitionnement est étudiée pour avoir un meilleur aperçu sur l'architecture 3D de sorte qu'il peut être exploitée pour des performances optimales. En utilisant l'approche 3D homogène empilage, NoC topologies est explorée afin d'identifier la meilleure topologie entre la topologie 2D et 3D pour la mise en œuvre MPSoC 3D sous l'hypothèse que les chemins critiques est fondée sur les liens inter-routeur. Les explorations architecturales ont également examiné les différentes technologies de traitement. mettant en évidence l'effet de la technologie des procédés à la performance d'architecture 3D en particulier pour l'interconnexion dominant du design. En outre, nous avons effectué hétérogène 3D d'empilage pour la mise en oeuvre MPSoC avec l'approche GALS de style et présenté plusieurs analyses de conception physiques connexes concernant la conception 3D et la mise en œuvre MPSoC utilisant des outils de CAO 2D. Une analyse plus approfondie de l'effet microbilles pas à la performance de l'architecture 3D à l'aide face-à -face d'empilement est également signalé l'identification des problèmes et des limitations à prendre en considération pendant le processus de conception.In this thesis, we study the exploration 3D NoC architectures through physical design implementations using real 3D technology used in the industry. Based on the proposed 3D design flow focusing on timing verification by leveraging the benefit of negligible delay of microbumps structure for vertical connections, we have conducted partitioning techniques for 3D NoC-based MPSoC architecture including homogeneous and heterogeneous stacking using Tezzaron 3D IC technlogy. Design and implementation trade-off in both partitioning methods is investigated to have better insight about 3D architecture so that it can be exploited for optimal performance. Using homogeneous 3D stacking approach, NoC architectures are explored to identify the best topology between 2D and 3D topology for 3D MPSoC implementation. The architectural explorations have also considered different process technologies highlighting the wire delay effect to the 3D architecture performance especially for interconnect-dominated design. Additionally, we performed heterogeneous 3D stacking of NoC-based MPSoC implementation with GALS style approach and presented several physical designs related analyses regarding 3D MPSoC design and implementation using 2D EDA tools. Finally we conducted an exploration of 2D EDA tool on different 3D architecture to evaluate the impact of 2D EDA tools on the 3D architecture performance. Since there is no commercialize 3D design tool until now, the experiment is important on the basis that designing 3D architecture using 2D EDA tools does not have a strong and direct impact to the 3D architecture performance mainly because the tools is dedicated for 2D architecture design.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF
Practical Techniques for Improving Performance and Evaluating Security on Circuit Designs
As the modern semiconductor technology approaches to nanometer era, integrated circuits (ICs) are facing more and more challenges in meeting performance demand and security. With the expansion of markets in mobile and consumer electronics, the increasing demands require much faster delivery of reliable and secure IC products. In order to improve the performance and evaluate the security of emerging circuits, we present three practical techniques on approximate computing, split manufacturing and analog layout automation. Approximate computing is a promising approach for low-power IC design. Although a few accuracy-configurable adder (ACA) designs have been developed in the past, these designs tend to incur large area overheads as they rely on either redundant computing or complicated carry prediction. We investigate a simple ACA design that contains no redundancy or error detection/correction circuitry and uses very simple carry prediction. The simulation results show that our design dominates the latest previous work on accuracy-delay-power tradeoff while using 39% less area. One variant of this design provides finer-grained and larger tunability than that of the previous works. Moreover, we propose a delay-adaptive self-configuration technique to further improve the accuracy-delay-power tradeoff. Split manufacturing prevents attacks from an untrusted foundry. The untrusted foundry has front-end-of-line (FEOL) layout and the original circuit netlist and attempts to identify critical components on the layout for Trojan insertion. Although defense methods for this scenario have been developed, the corresponding attack technique is not well explored. Hence, the defense methods are mostly evaluated with the k-security metric without actual attacks. We develop a new attack technique based on structural pattern matching. Experimental comparison with existing attack shows that the new attack technique achieves about the same success rate with much faster speed for cases without the k-security defense, and has a much better success rate at the same runtime for cases with the k-security defense. The results offer an alternative and practical interpretation for k-security in split manufacturing.
Analog layout automation is still far behind its digital counterpart. We develop the layout automation framework for analog/mixed-signal ICs. A hierarchical layout synthesis flow which works in bottom-up manner is presented. To ensure the qualified layouts for better circuit performance, we use the constraint-driven placement and routing methodology which employs the expert knowledge via design constraints. The constraint-driven placement uses simulated annealing process to find the optimal solution. The packing represented by sequence pairs and constraint graphs can simultaneously handle different kinds of placement constraints. The constraint-driven routing consists of two stages, integer linear programming (ILP) based global routing and sequential detailed routing. The experiment results demonstrate that our flow can handle complicated hierarchical designs with multiple design constraints. Furthermore, the placement performance can be further improved by using mixed-size block placement which works on large blocks in priority
Recommended from our members
Automatic synthesis of analog layout : a survey
A review of recent research in the automatic synthesis of physical geometry for analog integrated circuits is presented. On introduction, an explanation of the difficulties involved in analog layout as opposed to digital layout is covered. Review of the literature then follows. Emphasis is placed on the exposition of general methods for addressing problems specific to analog layout, with the details of specific systems only being given when they surve to illustrate these methods well. The conclusion discusses problems remaining and offers a prediction as to how technology will evolve to solve them. It is argued that although progress has been and will continue to be made in the automation of analog IC layout, due to fundamental differences in the nature of analog IC design as opposed to digital design, it should not be expected that the level of automation of the former will reach that of the latter any time soon
- …