6 research outputs found

    Thermal Issues in Testing of Advanced Systems on Chip

    Full text link

    Optimizing the integration and energy efficiency of through silicon via-based 3D interconnects

    Get PDF
    The aggressive scaling of CMOS process technology has been driving the rapid growth of the semiconductor industry for more than three decades. In recent years, the performance gains enabled by CMOS scaling have been increasingly challenged by highlyparasitic on-chip interconnects as wire parasitics do not scale at the same pace. Emerging 3D integration technologies based on vertical through-silicon vias (TSVs) promise a solution to the interconnect performance bottleneck, along with reduced fabrication cost and heterogeneous integration. As TSVs are a relatively recent interconnect technology, innovative test structures are required to evaluate and optimise the process, as well as extract parameters for the generation of design rules and models. From the circuit designer’s perspective, critical TSV characteristics are its parasitic capacitance, and thermomechanical stress distribution. This work proposes new test structures for extracting these characteristics. The structures were fabricated on a 65nm 3D process and used for the evaluation of that technology. Furthermore, as TSVs are implemented in large, densely interconnected 3D-system-on-chips (SoCs), the TSV parasitic capacitance may become an important source of energy dissipation. Typical low-power techniques based on voltage scaling can be used, though this represents a technical challenge in modern technology nodes. In this work, a novel TSV interconnection scheme is proposed based on reversible computing, which shows frequencydependent energy dissipation. The scheme is analysed using theoretical modelling, while a demonstrator IC was designed based on the developed theory and fabricated on a 130nm 3D process.EThOS - Electronic Theses Online ServiceEngineering and Physical Science Research Council (EPSRC)GBUnited Kingdo

    On-chip toggle generators to provide realistic conditions during test of digital 2D-SoCs and 3D-SICs

    No full text
    \u3cp\u3eIn digital logic circuits, unconstrained scan tests are known to evoke much higher switching activity than functional modes. State-of-the-art ATPG tools have knobs to constrain the switching activity of the generated test to a user-defined functional level. Two-dimensional System-on-Chips (2D-SoCs) and three-dimensional stacked ICs (3D-SICs) are typically tested in a modular fashion, i.e., per embedded core or stacked die. At any moment during the test, one or more modules are tested ('module-under-test', MUT). In this work, we present the impact of the switching activity in the currently not-tested modules (which we refer to as 'neighbors' of the MUT) on overall IR-drop and propose a method to provide realistic conditions during modular test of digital 2D-SoCs and 3D-SICs, using on-chip programmable toggle generators.\u3c/p\u3

    On-chip toggle generators to provide realistic conditions during test of digital 2D-SoCs and 3D-SICs

    No full text
    In digital logic circuits, unconstrained scan tests are known to evoke much higher switching activity than functional modes. State-of-the-art ATPG tools have knobs to constrain the switching activity of the generated test to a user-defined functional level. Two-dimensional System-on-Chips (2D-SoCs) and three-dimensional stacked ICs (3D-SICs) are typically tested in a modular fashion, i.e., per embedded core or stacked die. At any moment during the test, one or more modules are tested ('module-under-test', MUT). In this work, we present the impact of the switching activity in the currently not-tested modules (which we refer to as 'neighbors' of the MUT) on overall IR-drop and propose a method to provide realistic conditions during modular test of digital 2D-SoCs and 3D-SICs, using on-chip programmable toggle generators

    Memory Hierarchy Design for Next Generation Scalable Many-core Platforms

    Get PDF
    Performance and energy consumption in modern computing platforms is largely dominated by the memory hierarchy. The increasing computational power in the multiprocessors and accelerators, and the emergence of the data-intensive workloads (e.g. large-scale graph traversal and scientific algorithms) requiring fast transfer of large volumes of data, are two main trends which intensify this problem by putting even higher pressure on the memory hierarchy. This increasing gap between computation speed and data transfer speed is commonly referred as the “memory wall” problem. With the emergence of heterogeneous Three Dimensional (3D) Integration based on through-silicon-vias (TSV), this situation has started to recover in the past years. On one hand, it is now possible to improve memory access bandwidth and/or latency by either stacking memories directly on top of processors or through abstracted memory interfaces such as Micron’s Hybrid Memory Cube (HMC). On the other hand, near memory computation has become worthy of revisiting due to the cost-effective integration of logic and memory in 3D stacks. These two directions bring about several interesting opportunities including performance improvement, energy and cost reduction, product miniaturization, and modular design for improved time to market. In this research, we study the effectiveness of the 3D integration technology and the optimization opportunities which it can provide in the different layers of the memory hierarchy in cluster-based many-core platforms ranging from intra-cluster L1 to inter-cluster L2 scratchpad memories (SPMs), as well as the main memory. In addition, by moving a part of the computation to where data resides, in the 3D-stacked memory context, we demonstrate further energy and performance improvement opportunities

    Heterogeneous parallel virtual machine: A portable program representation and compiler for performance and energy optimizations on heterogeneous parallel systems

    Get PDF
    Programming heterogeneous parallel systems, such as the SoCs (System-on-Chip) on mobile and edge devices is extremely difficult; the diverse parallel hardware they contain exposes vastly different hardware instruction sets, parallelism models and memory systems. Moreover, a wide range of diverse hardware and software approximation techniques are available for applications targeting heterogeneous SoCs, further exacerbating the programmability challenges. In this thesis, we alleviate the programmability challenges of such systems using flexible compiler intermediate representation solutions, in order to benefit from the performance and superior energy efficiency of heterogeneous systems. First, we develop Heterogeneous Parallel Virtual Machine (HPVM), a parallel program representation for heterogeneous systems, designed to enable functional and performance portability across popular parallel hardware. HPVM is based on a hierarchical dataflow graph with side effects. HPVM successfully supports three important capabilities for programming heterogeneous systems: a compiler intermediate representation (IR), a virtual instruction set (ISA), and a basis for runtime scheduling. We use the HPVM representation to implement an HPVM prototype, defining the HPVM IR as an extension of the Low Level Virtual Machine (LLVM) IR. Our results show comparable performance with optimized OpenCL kernels for the target hardware from a single HPVM representation using translators from HPVM virtual ISA to native code, IR optimizations operating directly on the HPVM representation, and the capability for supporting flexible runtime scheduling schemes from a single HPVM representation. We extend HPVM to ApproxHPVM, introducing hardware-independent approximation metrics in the IR to enable maintaining accuracy information at the IR level and mapping of application-level end-to-end quality metrics to system level "knobs". The approximation metrics quantify the acceptable accuracy loss for individual computations. Application programmers only need to specify high-level, and end-to-end, quality metrics, instead of detailed parameters for individual approximation methods. The ApproxHPVM system then automatically tunes the accuracy requirements of individual computations and maps them to approximate hardware when possible. ApproxHPVM results show significant performance and energy improvements for popular deep learning benchmarks. Finally, we extend to ApproxHPVM to ApproxTuner, a compiler and runtime system for approximation. ApproxTuner extends ApproxHPVM with a wide range of hardware and software approximation techniques. It uses a three step approximation tuning strategy, a combination of development-time, install-time, and dynamic tuning. Our strategy ensures software portability, even though approximations have highly hardware-dependent performance, and enables efficient dynamic approximation tuning despite the expensive offline steps. ApproxTuner results show significant performance and energy improvements across 7 Deep Neural Networks and 3 image processing benchmarks, and ensures that high-level end-to-end quality specifications are satisfied during adaptive approximation tuning
    corecore