308 research outputs found

    2D-TCAD Simulation on Retention Time of Z2FET for DRAM Application

    Get PDF
    Traditional memory devices are facing more challenges due to continuous down-scaling. 6T-SRAM suffers from variability [1-2] and reliability [3-4] issues, which introduce cell stability problems. DRAM cells with one transistor, one capacitor (1T1C) struggle to maintain refresh time [5-6]. Efforts have been made to find new memory solutions, such as one transistor (1T) solutions [7-9]. Floating body based memory structures are among the potential candidates, but impact ionization or band-to-band tunnelling (B2BT) limits their refresh time [10]. A recently proposed zero impact ionization and zero subthreshold swing device named Z2FET [9, 11-12] has been demonstrated and is a promising candidate for 1T DRAM memory cell due to technology advantages such as CMOS technology compatibility, novel capacitor-less structure and sharp switching characteristics. In the Z2FET memory operation, refresh frequency is determined by data retention time. Previous research [11-12] is lacking systematic simulation analysis and understanding on the underlying mechanisms. In this paper, we propose a new simulation methodology to accurately extract retention time in Z2FET devices and understand its dependency on applied biases, temperatures and relevant physical mechanisms. Since the stored ‘1’ state in Z2FET is an equilibrium state [9, 11-12] and there is no need to refresh, we will concentrate on state ‘0’ retention. Two types of ‘0’ retention time: HOLD ‘0’ and READ ‘0’ retention time will be discussed separately

    Evolutionary Memory: Unified Random Access Memory (URAM)

    Get PDF

    Cryogenic Memory Technologies

    Full text link
    The surging interest in quantum computing, space electronics, and superconducting circuits has led to new developments in cryogenic data storage technology. Quantum computers promise to far extend our processing capabilities and may allow solving currently intractable computational challenges. Even with the advent of the quantum computing era, ultra-fast and energy-efficient classical computing systems are still in high demand. One of the classical platforms that can achieve this dream combination is superconducting single flux quantum (SFQ) electronics. A major roadblock towards implementing scalable quantum computers and practical SFQ circuits is the lack of suitable and compatible cryogenic memory that can operate at 4 Kelvin (or lower) temperature. Cryogenic memory is also critically important in space-based applications. A multitude of device technologies have already been explored to find suitable candidates for cryogenic data storage. Here, we review the existing and emerging variants of cryogenic memory technologies. To ensure an organized discussion, we categorize the family of cryogenic memory platforms into three types: superconducting, non-superconducting, and hybrid. We scrutinize the challenges associated with these technologies and discuss their future prospects.Comment: 21 pages, 6 figures, 1 tabl

    Thorough understanding of retention time of Z2FET memory operation

    Get PDF
    A recently reported zero impact ionization and zero subthreshold swing device Z2FET is a promising candidate for capacitor-less dynamic random access memory (DRAM) memory cell. In the memory operation, data retention time determines refresh frequency and is one of the most important memory merits. In this paper, we have systematically investigated the Z2FET retention time based on a newly proposed characterization methodology. It is found that the degradation of HOLD ``0'' retention time originates from the gated-silicon on insulator (SOI) portion rather than the intrinsic-SOI region of the Z2FET. Electrons accumulate under front gate and finally collapse the potential barrier turning logic ``0''-``1.'' It appears that Shockley-Read-Hall (SRH) generation is the main source for electrons accumulation. Z2FET scalability has been investigated in terms of retention time. As the Z2FET is downscaled, the mechanism dominating electrons accumulation switches from SRH to parasitic injection of electrons from the cathode. The results show that the downscaling of Lg has little effect on data ``0'' retention, but Lin is limited to ~ 125 nm. An optimization method of the fabrication process is proposed based on this new understanding, and Lin can be further scaled down to 75 nm. We have demonstrated by 2-D TCAD simulation that Z2FET is a promising DRAM cells' candidate particularly for Internet-of-Things applications

    Advances in Solid State Circuit Technologies

    Get PDF
    This book brings together contributions from experts in the fields to describe the current status of important topics in solid-state circuit technologies. It consists of 20 chapters which are grouped under the following categories: general information, circuits and devices, materials, and characterization techniques. These chapters have been written by renowned experts in the respective fields making this book valuable to the integrated circuits and materials science communities. It is intended for a diverse readership including electrical engineers and material scientists in the industry and academic institutions. Readers will be able to familiarize themselves with the latest technologies in the various fields

    Sub-10nm Transistors for Low Power Computing: Tunnel FETs and Negative Capacitance FETs

    Get PDF
    One of the major roadblocks in the continued scaling of standard CMOS technology is its alarmingly high leakage power consumption. Although circuit and system level methods can be employed to reduce power, the fundamental limit in the overall energy efficiency of a system is still rooted in the MOSFET operating principle: an injection of thermally distributed carriers, which does not allow subthreshold swing (SS) lower than 60mV/dec at room temperature. Recently, a new class of steep-slope devices like Tunnel FETs (TFETs) and Negative-Capacitance FETs (NCFETs) have garnered intense interest due to their ability to surpass the 60mV/dec limit on SS at room temperature. The focus of this research is on the simulation and design of TFETs and NCFETs for ultra-low power logic and memory applications. Using full band quantum mechanical model within the Non-Equilibrium Greens Function (NEGF) formalism, source-underlapping has been proposed as an effective technique to lower the SS in GaSb-InAs TFETs. Band-tail states, associated with heavy source doping, are shown to significantly degrade the SS in TFETs from their ideal value. To solve this problem, undoped source GaSb-InAs TFET in an i-i-n configuration is proposed. A detailed circuit-to-system level evaluation is performed to investigate the circuit level metrics of the proposed devices. To demonstrate their potential in a memory application, a 4T gain cell (GC) is proposed, which utilizes the low-leakage and enhanced drain capacitance of TFETs to realize a robust and long retention time GC embedded-DRAMs. The device/circuit/system level evaluation of proposed TFETs demonstrates their potential for low power digital applications. The second part of the thesis focuses on the design space exploration of hysteresis-free Negative Capacitance FETs (NCFETs). A cross-architecture analysis using HfZrOx ferroelectric (FE-HZO) integrated on bulk MOSFET, fully-depleted SOI-FETs, and sub-10nm FinFETs shows that FDSOI and FinFET configurations greatly benefit the NCFET performance due to their undoped body and improved gate-control which enables better capacitance matching with the ferroelectric. A low voltage NC-FinFET operating down to 0.25V is predicted using ultra-thin 3nm FE-HZO. Next, we propose one-transistor ferroelectric NOR type (Fe-NOR) non-volatile memory based on HfZrOx ferroelectric FETs (FeFETs). The enhanced drain-channel coupling in ultrashort channel FeFETs is utilized to dynamically modulate memory window of storage cells thereby resulting in simple erase-, program-and read-operations. The simulation analysis predicts sub-1V program/erase voltages in the proposed Fe-NOR memory array and therefore presents a significantly lower power alternative to conventional FeRAM and NOR flash memories

    Variation Analysis, Fault Modeling and Yield Improvement of Emerging Spintronic Memories

    Get PDF

    Caractérisation, mécanismes et applications mémoire des transistors avancés sur SOI

    Get PDF
    Ce travail présente les principaux résultats obtenus avec une large gamme de dispositifs SOI avancés, candidats très prometteurs pour les futurs générations de transistors MOSFETs. Leurs propriétés électriques ont été analysées par des mesures systématiques, agrémentées par des modèles analytiques et/ou des simulations numériques. Nous avons également proposé une utilisation originale de dispositifs FinFETs fabriqués sur ONO enterré en fonctionnalisant le ONO à des fins d'application mémoire non volatile, volatile et unifiées. Après une introduction sur l'état de l'art des dispositifs avancés en technologie SOI, le deuxième chapitre a été consacré à la caractérisation détaillée des propriétés de dispositifs SOI planaires ultra- mince (épaisseur en dessous de 7 nm) et multi-grille. Nous avons montré l excellent contrôle électrostatique par la grille dans les transistors très courts ainsi que des effets intéressants de transport et de couplage. Une approche similaire a été utilisée pour étudier et comparer des dispositifs FinFETs à double grille et triple grille. Nous avons démontré que la configuration FinFET double grille améliore le couplage avec la grille arrière, phénomène important pour des applications à tension de seuil multiple. Nous avons proposé des modèles originaux expliquant l'effet de couplage 3D et le comportement de la mobilité dans des TFTs nanocristallin ZnO. Nos résultats ont souligné les similitudes et les différences entre les transistors SOI et à base de ZnO. Des mesures à basse température et de nouvelles méthodes d'extraction ont permis d'établir que la mobilité dans le ZnO et la qualité de l'interface ZnO/SiO2 sont remarquables. Cet état de fait ouvre des perspectives intéressantes pour l'utilisation de ce type de matériaux aux applications innovantes de l'électronique flexible. Dans le troisième chapitre, nous nous sommes concentrés sur le comportement de la mobilité dans les dispositifs SOI planaires et FinFET en effectuant des mesures de magnétorésistance à basse température. Nous avons mis en évidence expérimentalement un comportement de mobilité inhabituel (multi-branche) obtenu lorsque deux ou plusieurs canaux coexistent et interagissent. Un autre résultat original concerne l existence et l interprétation de la magnétorésistance géométrique dans les FinFETs.L'utilisation de FinFETs fabriqués sur ONO enterré en tant que mémoire non volatile flash a été proposée dans le quatrième chapitre. Deux mécanismes d'injection de charge ont été étudiés systématiquement. En plus de la démonstration de la pertinence de ce type mémoire en termes de performances (rétention, marge de détection), nous avons mis en évidence un comportement inattendu : l amélioration de la marge de détection pour des dispositifs à canaux courts. Notre concept innovant de FinFlash sur ONO enterré présente plusieurs avantages: (i) opération double-bit et (ii) séparation de la grille de stockage et de l'interface de lecture augmentant la fiabilité et autorisant une miniaturisation plus poussée que des Finflash conventionnels avec grille ONO.Dans le dernier chapitre, nous avons exploré le concept de mémoire unifiée, en combinant les opérations non volatiles et 1T-DRAM par le biais des FinFETs sur ONO enterré. Comme escompté pour les mémoires dites unifiées, le courant transitoire en mode 1T-DRAM dépend des charges non volatiles stockées dans le ONO. D'autre part, nous avons montré que les charges piégées dans le nitrure ne sont pas perturbées par les opérations de programmation et lecture de la 1T-DRAM. Les performances de cette mémoire unifiée multi-bits sont prometteuses et pourront être considérablement améliorées par optimisation technologique de ce dispositif.The evolution of electronic systems and portable devices requires innovation in both circuit design and transistor architecture. During last fifty years, the main issue in MOS transistor has been the gate length scaling down. The reduction of power consumption together with the co-integration of different functions is a more recent avenue. In bulk-Si planar technology, device shrinking seems to arrive at the end due to the multiplication of parasitic effects. The relay has been taken by novel SOI-like device architectures. In this perspective, this manuscript presents the main achievements of our work obtained with a variety of advanced fully depleted SOI MOSFETs, which are very promising candidates for next generation MOSFETs. Their electrical properties have been analyzed by systematic measurements and clarified by analytical models and/or simulations. Ultimately, appropriate applications have been proposed based on their beneficial features.In the first chapter, we briefly addressed the short-channel effects and the diverse technologies to improve device performance. The second chapter was dedicated to the detailed characterization and interesting properties of SOI devices. We have demonstrated excellent gate control and high performance in ultra-thin FD SOI MOSFET. The SCEs are efficiently suppressed by decreasing the body thickness below 7 nm. We have investigated the transport and electrostatic properties as well as the coupling mechanisms. The strong impact of body thickness and temperature range has been outlined. A similar approach was used to investigate and compare vertical double-gate and triple-gate FinFETs. DG FinFETs show enhanced coupling to back-gate bias which is applicable and suitable for dynamic threshold voltage tuning. We have proposed original models explaining the 3D coupling effect in FinFETs and the mobility behavior in ZnO TFTs. Our results pointed on the similarities and differences in SOI and ZnO transistors. According to our low-temperature measurements and new promoted extraction methods, the mobility in ZnO and the quality of ZnO/SiO2 interface are respectable, enabling innovating applications in flexible, transparent and power electronics. In the third chapter, we focused on the mobility behavior in planar SOI and FinFET devices by performing low-temperature magnetoresistance measurements. Unusual mobility curve with multi-branch aspect were obtained when two or more channels coexist and interplay. Another original result in the existence of the geometrical magnetoresistance in triple-gate and even double-gate FinFETs.The operation of a flash memory in FinFETs with ONO buried layer was explored in the forth chapter. Two charge injection mechanisms were proposed and systematically investigated. We have discussed the role of device geometry and temperature. Our novel ONO FinFlash concept has several distinct advantages: double-bit operation, separation of storage medium and reading interface, reliability and scalability. In the final chapter, we explored the avenue of unified memory, by combining nonvolatile and 1T-DRAM operations in a single transistor. The key result is that the transient current, relevant for 1T-DRAM operation, depends on the nonvolatile charges stored in the nitride buried layer. On the other hand, the trapped charges are not disturbed by the 1T-DRAM operation. Our experimental data offers the proof-of-concept for such advanced memory. The performance of the unified/multi-bit memory is already decent but will greatly improve in the coming years by processing dedicated devices.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    이진 뉴럴 네트워크를 위한 DRAM 기반의 뉴럴 네트워크 가속기 구조

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2021. 2. 유승주.In the convolutional neural network applications, most computations occurred by the multiplication and accumulation of the convolution and fully-connected layers. From the hardware perspective (i.e., in the gate-level circuits), these operations are performed by many dot-products between the feature map and kernel vectors. Since the feature map and kernel have the matrix form, the vector converted from 3D, or 4D matrices is reused many times for the matrix multiplications. As the throughput of the DNN increases, the power consumption and performance bottleneck due to the data movement become a more critical issue. More importantly, power consumption due to off-chip memory accesses dominates total power since off-chip memory access consumes several hundred times greater power than the computation. The accelerators' throughput is about several hundred GOPS~several TOPS, but Memory bandwidth is less than 25.6 or 34 GB/s (with DDR4 or LPDDR4). By reducing the network size and/or data movement size, both data movement power and performance bottleneck problems are improved. Among the algorithms, Quantization is widely used. Binary Neural Networks (BNNs) dramatically reduce precision down to 1 bit. The accuracy is much lower than that of the FP16, but the accuracy is continuously improving through various studies. With the data flow control, there is a method of reducing redundant data movement by increasing data reuse. The above two methods are widely applied in accelerators because they do not need additional computations in the inference computation. In this dissertation, I present 1) a DRAM-based accelerator architecture and 2) a DRAM refresh method to improve performance reduction due to DRAM refresh. Both methods are orthogonal, so can be integrated into the DRAM chip and operate independently. First, we proposed a DRAM-based accelerator architecture capable of massive and large vector dot product operation. In the field of CNN accelerators to which BNN can be applied, a computing-in-memory (CIM) structure that utilizes a cell-array structure of Memory for vector dot product operation is being actively studied. Since DRAM stores all the neural network data, it is advantageous to reduce the amount of data transfer. The proposed architecture operates by utilizing the basic operation of the DRAM. The second method is to reduce the performance degradation and power consumption caused by DRAM refresh. Since the DRAM cannot read and write data while performing a periodic refresh, system performance decreases. The proposed refresh method tests the refresh characteristics inside the DRAM chip during self-refresh and increases the refresh cycle according to the characteristics. Since it operates independently inside DRAM, it can be applied to all systems using DRAM and is the same for deep neural network accelerators. We surveyed system integration with a software stack to use the in-DRAM accelerator in the DL framework. As a result, it is expected to control in-DRAM accelerators with the memory controller implementation method verified in the previous experiment. Also, we have added the performance simulation function of in-DRAM accelerator to PyTorch. When running a neural network in PyTorch, it reports the computation latency and data movement latency occurring in the layer running in the in-DRAM accelerator. It is a significant advantage to predict the performance when running in hardware while co-designing the network.컨볼루셔널 뉴럴 네트워크 (CNN) 어플리케이션에서는, 대부분의 연산이 컨볼루션 레이어와 풀리-커넥티드 레이어에서 발생하는 곱셈과 누적 연산이다. 게이트-로직 레벨에서는, 대량의 벡터 내적으로 실행되며, 입력과 커널 벡터들을 반복해서 사용하여 연산한다. 딥 뉴럴 네트워크 연산에는 범용 연산 유닛보다, 단순한 연산이 가능한 작은 연산 유닛을 대량으로 사용하는 것이 적합하다. 가속기의 성능이 일정 이상 높아지면, 가속기의 성능은 연산에 필요한 데이터 전송에 의해 제한된다. 메모리에서 데이터를 오프-칩으로 전송할 때의 에너지 소모가, 연산 유닛에서 연산에 사용되는 에너지의 수백배로 크다. 또한 연산기의 성능은 초당 수백 기가~수 테라-연산이 가능하지만, 메모리의 데이터 전송은 초당 수십 기가 바이트이다. 데이터 전송에 의한 파워와 성능 문제를 동시에 해결하는 방법은, 전송되는 데이터 크기를 줄이는 것이다. 알고리즘 중에서는 네트워크의 데이터를 양자화하여, 낮은 정밀도로 데이터를 표현하는 방법이 널리 사용된다. 이진 뉴럴 네트워크(BNN)는 정밀도를 1비트까지 극단적으로 낮춘다. 16비트 정밀도보다 네트워크의 정확도가 낮아지는 문제가 있지만, 다양한 연구를 통해 정확도가 지속적으로 개선되고 있다. 또한 구조적으로는, 전송된 데이터를 재사용하여 동일한 데이터의 반복적인 전송을 줄이는 방법이 있다. 위의 두 가지 방법은 추론 과정에서 별도의 연산 없이 적용 가능하여 가속기에서 널리 적용되고 있다. 본 논문에서는, DRAM 기반의 가속기 구조를 제안하고, DRAM refresh에 의한 성능 감소를 개선하는 기술을 제안하였다. 두 방법은 하나의 DRAM 칩으로 집적 가능하며, 독립적으로 구동 가능하다. 첫번째는 대량의 벡터 내적 연산이 가능한 DRAM 기반 가속기에 대한 연구이다. BNN을 적용할 수 있는 CNN가속기 분야에서, 메모리의 셀-어레이 구조를 벡터 내적 연산에 활용하는 컴퓨팅-인-메모리(CIM) 구조가 활발히 연구되고 있다. 특히, DRAM에는 뉴럴 네트워크의 모든 데이터가 있기 때문에, 데이터 전송량의 감소에 유리하다. 우리는 DRAM 셀-어레이의 구조를 바꾸지 않고, DRAM의 기본 동작을 활용하여 연산하는 방법을 제안하였다. 두번째는 DRAM 리프레쉬 주기를 늘려서 성능 열화와 파워 소모를 개선하는 방법이다. DRAM이 리프레쉬를 실행할 때마다, 데이터를 읽고 쓸 수 없기 때문에 시스템 혹은 가속기의 성능 감소가 발생한다. DRAM 칩 내부에서 DRAM의 리프레쉬 특성을 테스트하고, 리프레쉬 주기를 늘리는 방법을 제안하였다. DRAM 내부에서 독립적으로 동작하기 때문에 DRAM을 사용하는 모든 시스템에 적용 가능하며, 딥 뉴럴 네트워크 가속기에서도 동일하다. 또한, 제안된 가속기를 PyTorch와 같이 널리 사용되는 딥러닝 프레임 워크에서도 쉽게 사용할 수 있도록, 소프트웨어 스택을 비롯한 system integration 방법을 조사하였다. 결과적으로, 기존의 TVM compiler와 FPGA로 구현하는 TVM/VTA 가속기에, DRAM refresh 실험에서 검증된 메모리 컨트롤러와 커스텀 컴파일러를 추가하면 in-DRAM 가속기를 제어할 수 있을 것으로 기대된다. 이에 더하여, in-DRAM 가속기와 뉴럴 네트워크의 설계 단계에서 성능을 예측할 수 있도록, 시뮬레이션 기능을 PyTorch에 추가하였다. PyTorch에서 신경망을 실행할 때, DRAM 가속기에서 실행되는 계층에서 발생하는 계산 대기 시간 및 데이터 이동 시간을 확인할 수 있다.Abstract i Contents viii List of Tables x List of Figures xiv Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 Neural Network Operation . . . . . . . . . . . . . . . . 6 2.2 Data Movement Overhead . . . . . . . . . . . . . . . . 7 2.3 Binary Neural Networks . . . . . . . . . . . . . . . . . 10 2.4 Computing-in-Memory . . . . . . . . . . . . . . . . . . 11 2.5 Memory Bottleneck due to Refresh . . . . . . . . . . . . 13 Chapter 3 In-DRAM Neural Network Accelerator 16 3.1 Backgrounds . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 DRAM hierarchy . . . . . . . . . . . . . . . . . 18 3.1.2 DRAM Basic Operation . . . . . . . . . . . . . 21 3.1.3 DRAM Commands with Timing Parameters . . . 22 3.1.4 Bit-wise Operation in DRAM . . . . . . . . . . 25 3.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Proposed architecture . . . . . . . . . . . . . . . . . . . 30 3.3.1 Operation Examples of Row Operator . . . . . . 32 3.3.2 Convolutions on DRAM Chip . . . . . . . . . . 39 3.4 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.1 Input Broadcasting in DRAM . . . . . . . . . . 44 3.4.2 Input Data Movement With M2V . . . . . . . . . 47 3.4.3 Internal Data Movement With SiD . . . . . . . . 49 3.4.4 Data Partitioning for Parallel Operation . . . . . 52 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5.1 Performance Estimation . . . . . . . . . . . . . 56 3.5.2 Configuration of In-DRAM Accelerator . . . . . 58 3.5.3 Improving the Accuracy of BNN . . . . . . . . . 60 3.5.4 Comparison with the Existing Works . . . . . . . 62 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.6.1 Performance Comparison with ASIC Accelerators 67 3.6.2 Challenges of The Proposed Architecture . . . . 70 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 72 Chapter 4 Reducing DRAM Refresh Power Consumption by Runtime Profiling of Retention Time and Dualrow Activation 74 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 Observations . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5 Solution overview . . . . . . . . . . . . . . . . . . . . . 88 4.6 Runtime profiling . . . . . . . . . . . . . . . . . . . . . 93 4.6.1 Basic Operation . . . . . . . . . . . . . . . . . . 93 4.6.2 Profiling Multiple Rows in Parallel . . . . . . . . 96 4.6.3 Temperature, Data Backup and Error Check . . . 96 4.7 Dual-row Activation . . . . . . . . . . . . . . . . . . . . 98 4.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 102 4.8.1 Experimental Setup . . . . . . . . . . . . . . . . 103 4.8.2 Refresh Period Improvement . . . . . . . . . . . 107 4.8.3 Power Reduction . . . . . . . . . . . . . . . . . 110 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 116 Chapter 5 System Integration 118 5.1 Integrate The Proposed Methods . . . . . . . . . . . . . 118 5.2 Software Stack . . . . . . . . . . . . . . . . . . . . . . 121 Chapter 6 Conclusion 129 Bibliography 131 국문초록 153Docto
    corecore