Search CORE

55 research outputs found

A thermal simulation process based on electrical modeling for complex interconnect, packaging, and 3DI structures

Author: Banerjee K
Caron A
Deutsch A
Jiang L
Rubin BJ
Smith H
Weger AJ
Xu C
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

To reduce the product development time and achieve first-pass silicon success, fast and accurate estimation of very-large-scale integration (VLSI) interconnect, packaging and 3DI (3D integrated circuits) thermal profiles has become important. Present commercial thermal analysis tools are incapable of handling very complex structures and have integration difficulties with existing design flows. Many analytical thermal models, which could provide fast estimates, are either too specific or oversimplified. This paper highlights a methodology, which exploits electrical resistance solvers for thermal simulation, to allow acquisition of thermal profiles of complex structures with good accuracy and reasonable computation cost. Moreover, a novel accurate closed-form thermal model is developed. The model allows an isotropic or anisotropic equivalent medium to replace the noncritical back-end-of-line (BEOL) regions so that the simulation complexity is dramatically reduced. Using these techniques, this paper introduces the thermal modeling of practical complex VLSI structures to facilitate thermal guideline generation. It also demonstrates the benefits of the proposed anisotropic equivalent medium approximation for real VLSI structures in terms of the accuracy and computational cost. © 2006 IEEE.published_or_final_versio

HKU Scholars Hub

Simulation of the UKQCD computer

Author: Alam Sadaf
Publication venue: The University of Edinburgh
Publication date: 01/01/2004
Field of study

Edinburgh Research Archive

Evolutionary Memory: Unified Random Access Memory (URAM)

Author: Jin-Woo Han
Yang-Kyu Choi
Publication venue: 'IntechOpen'
Publication date: 01/04/2010
Field of study

IntechOpen

RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

Author: Gao Mingyu
Gu Chengyang
Jiang Li
Jing Naifeng
Liang Xiaoyao
Liu Fangxin
Tang Qidong
Yang Tao
Zhao Yilong
Publication venue
Publication date: 27/10/2022
Field of study

The second-order training methods can converge much faster than first-order optimizers in DNN training. This is because the second-order training utilizes the inversion of the second-order information (SOI) matrix to find a more accurate descent direction and step size. However, the huge SOI matrices bring significant computational and memory overheads in the traditional architectures like GPU and CPU. On the other side, the ReRAM-based process-in-memory (PIM) technology is suitable for the second-order training because of the following three reasons: First, PIM's computation happens in memory, which reduces data movement overheads; Second, ReRAM crossbars can compute SOI's inversion in

O\left(1\right)

time; Third, if architected properly, ReRAM crossbars can perform matrix inversion and vector-matrix multiplications which are important to the second-order training algorithms. Nevertheless, current ReRAM-based PIM techniques still face a key challenge for accelerating the second-order training. The existing ReRAM-based matrix inversion circuitry can only support 8-bit accuracy matrix inversion and the computational precision is not sufficient for the second-order training that needs at least 16-bit accurate matrix inversion. In this work, we propose a method to achieve high-precision matrix inversion based on a proven 8-bit matrix inversion (INV) circuitry and vector-matrix multiplication (VMM) circuitry. We design \archname{}, a ReRAM-based PIM accelerator architecture for the second-order training. Moreover, we propose a software mapping scheme for \archname{} to further optimize the performance by fusing VMM and INV crossbar. Experiment shows that \archname{} can achieve an average of 115.8

\times

/11.4

\times

speedup and 41.9

\times

/12.8

\times

energy saving compared to a GPU counterpart and PipeLayer on large-scale DNNs.Comment: 13pages, 13 figure

arXiv.org e-Print Archive

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

Author: Azarkhish Erfan
Benini Luca
Bonetti Andrea
Emery Stephane
Jokic Petar
Pons Marc
Publication venue
Publication date: 24/06/2021
Field of study

Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Low-Power Design of Digital VLSI Circuits around the Point of First Failure

Author: Bonetti Andrea
Publication venue: Lausanne, EPFL
Publication date: 17/01/2019
Field of study

As an increase of intelligent and self-powered devices is forecasted for our future everyday life, the implementation of energy-autonomous devices that can wirelessly communicate data from sensors is crucial. Even though techniques such as voltage scaling proved to effectively reduce the energy consumption of digital circuits, additional energy savings are still required for a longer battery life. One of the main limitations of essentially any low-energy technique is the potential degradation of the quality of service (QoS). Thus, a thorough understanding of how circuits behave when operated around the point of first failure (PoFF) is key for the effective application of conventional energy-efficient methods as well as for the development of future low-energy techniques. In this thesis, a variety of circuits, techniques, and tools is described to reduce the energy consumption in digital systems when operated either in the safe and conservative exact region, close to the PoFF, or even inside the inexact region. A straightforward approach to reduce the power consumed by clock distribution while safely operating in the exact region is dual-edge-triggered (DET) clocking. However, the DET approach is rarely taken, primarily due to the perceived complexity of its integration. In this thesis, a fully automated design flow is introduced for applying DET clocking to a conventional single-edge-triggered (SET) design. In addition, the first static true-single-phase-clock DET flip-flop (DET-FF) that completely avoids clock-overlap hazards of DET registers is proposed. Even though the correct timing of synchronous circuits is ensured in worst-case conditions, the critical path might not always be excited. Thus, dynamic clock adjustment (DCA) has been proposed to trim any available dynamic timing margin by changing the operating clock frequency at runtime. This thesis describes a dynamically-adjustable clock generator (DCG) capable of modifying the period of the produced clock signal on a cycle-by-cycle basis that enables the DCA technique. In addition, a timing-monitoring sequential (TMS) that detects input transitions on either one of the clock phases to enable the selection of the best timing-monitoring strategy at runtime is proposed. Energy-quality scaling techniques aimat trading lower energy consumption for a small degradation on the QoS whenever approximations can be tolerated. In this thesis, a low-power methodology for the perturbation of baseline coefficients in reconfigurable finite impulse response (FIR) filters is proposed. The baseline coefficients are optimized to reduce the switching activity of the multipliers in the FIR filter, enabling the possibility of scaling the power consumption of the filter at runtime. The area as well as the leakage power of many system-on-chips is often dominated by embedded memories. Gain-cell embedded DRAM (GC-eDRAM) is a compact, low-power and CMOS-compatible alternative to the conventional static random-access memory (SRAM) when a higher memory density is desired. However, due to GC-eDRAMs relying on many interdependent variables, the adaptation of existing memories and the design of future GCeDRAMs prove to be highly complex tasks. Thus, the first modeling tool that estimates timing, memory availability, bandwidth, and area of GC-eDRAMs for a fast exploration of their design space is proposed in this thesis

Infoscience - École polytechnique fédérale de Lausanne

Study and development of innovative strategies for energy-efficient cross-layer design of digital VLSI systems based on Approximate Computing

Author: STAZI GIULIA
Publication venue
Publication date: 18/02/2020
Field of study

The increasing demand on requirements for high performance and energy efficiency in modern digital systems has led to the research of new design approaches that are able to go beyond the established energy-performance tradeoff. Looking at scientific literature, the Approximate Computing paradigm has been particularly prolific. Many applications in the domain of signal processing, multimedia, computer vision, machine learning are known to be particularly resilient to errors occurring on their input data and during computation, producing outputs that, although degraded, are still largely acceptable from the point of view of quality. The Approximate Computing design paradigm leverages the characteristics of this group of applications to develop circuits, architectures, algorithms that, by relaxing design constraints, perform their computations in an approximate or inexact manner reducing energy consumption. This PhD research aims to explore the design of hardware/software architectures based on Approximate Computing techniques, filling the gap in literature regarding effective applicability and deriving a systematic methodology to characterize its benefits and tradeoffs. The main contributions of this work are: -the introduction of approximate memory management inside the Linux OS, allowing dynamic allocation and de-allocation of approximate memory at user level, as for normal exact memory; - the development of an emulation environment for platforms with approximate memory units, where faults are injected during the simulation based on models that reproduce the effects on memory cells of circuital and architectural techniques for approximate memories; -the implementation and analysis of the impact of approximate memory hardware on real applications: the H.264 video encoder, internally modified to allocate selected data buffers in approximate memory, and signal processing applications (digital filter) using approximate memory for input/output buffers and tap registers; -the development of a fully reconfigurable and combinatorial floating point unit, which can work with reduced precision formats

Archivio della ricerca- Università di Roma La Sapienza

コウユウデンリツゲートゼツエンマクノハクマクカトデンカイコウカトランジスタノキャリアユソウヘノエイキョウ

Author: Ando Takashi
アンドウタカシ
安藤崇志
Publication venue
Publication date
Field of study

Osaka University Knowledge Archive

Reliability in the face of variability in nanometer embedded memories

Author: Ganapathy Shrikanth
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Modeling and Fabrication of Low Power Devices and Circuits Using Low-Dimensional Materials

Author: Kshirsagar Chaitanya
Publication venue
Publication date: 01/07/2016
Field of study

University of Minnesota Ph.D. dissertation.July 2016. Major: Electrical Engineering. Advisor: Steven Koester. 1 computer file (PDF); x, 112 pages.As silicon approaches its ultimate scaling limit as a channel material for conventional semiconductor devices, alternate mechanisms and materials are emerging rapidly to replace or complement conventional silicon based devices. Attractive semiconducting properties such as high mobility, excellent interface quality, and better scalability are the properties desired for materials to be explored for electronic and photonic device applications. Hybrid III-V semiconductor based tunneling field effect transistors (TFETs) can provide a strong alternative due to their attractive properties such as subthreshold slopes less than 60 mV/decade, which can lead to aggressive power supply scaling. Here, InAs-SiGe-Si based TFETs are studied in detail. Simulations predict that subthreshold slopes as low as 18 mV/decade and on currents as high as 50 µA/µm can be achieved using such a device. However, the simulations also show that the device performance is limited by (1) the low density of states in the source which induces a trade-off between the source doping and the subthreshold slope, limiting power supply scaling, and (2) direct source-to-drain tunneling which limits gate length scaling. Another approach to explore low power alternatives to conventional semiconductor device can be to use emerging two-dimensional (2D) materials. In particular, the transition metal dichalcogenides (TMDs) are promising material group that, like graphene, these material exhibit 2D nature, but unlike graphene, have a finite band gap. In this work, the off-state characteristics are modelled for MoS2 MOSFETs (metal–oxide–semiconductor field-effect transistors), and their circuit performance is predicted. MoS2 Due to its higher effective masses and large band gap compared to silicon it is shown that MoS2 MOSFETs are well suited for dynamic memory applications. Two of such circuits, one transistor one capacitor (1TIC) and two transistor (2T) dynamic memory cells have been fabricated for the first time. Retention times as high as 0.25 second and 1.3 second for the 1T1C and 2T cell, respectively, are demonstrated. Moreover, ultra-low leakage currents less than femto-ampere per micron are extracted based on the retention time measurements. These results establish the potential of 2D MoS2 as an attractive material for low power device and circuit applications

University of Minnesota Digital Conservancy