Search CORE

1,101 research outputs found

DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips

Author: Ergin Oğuz
Hassan Hasan
Luo Haocong
Mutlu Onur
Olgun Ataberk
Orosa Lois
Patel Minesh
Tuğrul Yahya Can
Yağlıkçı A. Giray
Publication venue
Publication date: 02/06/2023
Field of study

To understand and improve DRAM performance, reliability, security and energy efficiency, prior works study characteristics of commodity DRAM chips. Unfortunately, state-of-the-art open source infrastructures capable of conducting such studies are obsolete, poorly supported, or difficult to use, or their inflexibility limit the types of studies they can conduct. We propose DRAM Bender, a new FPGA-based infrastructure that enables experimental studies on state-of-the-art DRAM chips. DRAM Bender offers three key features at the same time. First, DRAM Bender enables directly interfacing with a DRAM chip through its low-level interface. This allows users to issue DRAM commands in arbitrary order and with finer-grained time intervals compared to other open source infrastructures. Second, DRAM Bender exposes easy-to-use C++ and Python programming interfaces, allowing users to quickly and easily develop different types of DRAM experiments. Third, DRAM Bender is easily extensible. The modular design of DRAM Bender allows extending it to (i) support existing and emerging DRAM interfaces, and (ii) run on new commercial or custom FPGA boards with little effort. To demonstrate that DRAM Bender is a versatile infrastructure, we conduct three case studies, two of which lead to new observations about the DRAM RowHammer vulnerability. In particular, we show that data patterns supported by DRAM Bender uncovers a larger set of bit-flips on a victim row compared to the data patterns commonly used by prior work. We demonstrate the extensibility of DRAM Bender by implementing it on five different FPGAs with DDR4 and DDR3 support. DRAM Bender is freely and openly available at https://github.com/CMU-SAFARI/DRAM-Bender.Comment: To appear in TCAD 202

arXiv.org e-Print Archive

Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

Author: Najafabadi Navid Khoshavi
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2017
Field of study

While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Measuring the Energy Consumption of Software written in C on x86-64 Processors

Author: Strempel Tom
Publication venue
Publication date: 03/01/2022
Field of study

In 2016 German data centers consumed 12.4 terawatt-hours of electrical energy, which accounts for about 2% of Germany’s total energy consumption in that year. In 2020 this rose to 16 terawatt-hours or 2.9% of Germany’s total energy consumption in that year. The ever-increasing energy consumption of computers consequently leads to considerations to reduce it to save energy, money and to protect the environment. This thesis aims to answer fundamental questions about the energy consumption of software, e. g. how and how precise can a measurement be taken or if CPU load and energy consumption are correlated. An overview of measurement methods and the related software tooling was created. The most promising approach using software called 'Scaphandre' was chosen as the main basis and further developed. Different sorting algorithms were benchmarked to study their behavior regarding energy consumption. The resulting dataset was also used to answer the fundamental questions stated in the beginning. A replication and reproduction package was provided to enable the reproducibility of the results.Im Jahr 2016 verbrauchten deutsche Rechenzentren 12,4 Terawattstunden elektrische Energie, was etwa 2 % des gesamten Energieverbrauchs in Deutschland in diesem Jahr ausmacht. Im Jahr 2020 stieg dieser Wert auf 16 Terawattstunden bzw. 2,9 % des Gesamtenergieverbrauchs in Deutschland. Der stetig steigende Energieverbrauch von Computern führt folglich zu Überlegungen, diesen zu reduzieren, um Energie und Geld zu sparen und die Umwelt zu schützen. Ziel dieser Arbeit ist es, grundlegende Fragen zum Energieverbrauch von Software zu beantworten, z. B. wie und mit welcher Genauigkeit gemessen werden kann oder ob CPU-Last und Energieverbrauch korrelieren. Es wurde eine Übersicht über Messmethoden und die dazugehörigen Softwaretools erstellt. Der vielversprechendste Ansatz mit der Software 'Scaphandre' wurde als Hauptgrundlage ausgewählt und weiterentwickelt. Verschiedene Sortieralgorithmen wurden einem Benchmarking unterzogen, um ihr Verhalten hinsichtlich des Energieverbrauchs zu untersuchen. Der resultierende Datensatz wurde auch zur Beantwortung der eingangs gestellten grundlegenden Fragen verwendet. Ein Replikations- und Reproduktionspaket wurde bereitgestellt, um die Reproduzierbarkeit der Ergebnisse zu ermöglichen

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Qucosa - Publikationsserver der Universität Leipzig

Programmable built-in self-testing of embedded RAM clusters in system-on-chip architectures

Author: Benso Alfredo
DI CARLO Stefano
DI NATALE Giorgio
Lobetti Bodoni M.
Prinetto Paolo Ernesto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Multiport memories are widely used as embedded cores in all communication system-on-chip devices. Due to their high complexity and very low accessibility, built-in self-test (BIST) is the most common solution implemented to test the different memories embedded in the system. This article presents a programmable BIST architecture based on a single microprogrammable BIST processor and a set of memory wrappers designed to simplify the test of a system containing a large number of distributed multiport memories of different sizes (number of bits, number of words), access protocols (asynchronous, synchronous), and timing

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Uniform resistive switching memory using localized charge trapping

Author: 권영재
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 재료공학부(하이브리드 재료), 2020. 8. 황철성.멤리스터는 1971년 추아 교수에 의해 그 개념이 소개 되고, 2008년 휴렛팩커드(HP) 사에서 연구 개발을 발표한 기점으로, 많은 연구가 지속적으로 진행되고 있다. 최근에는 뉴로모픽과 로직, 신경모사들 다양한 분야로의 연구가 진행되고 있는 저항변화메모리는, 금속-절연막-금속의 간단한 구조를 가지며 간단한 공정방법으로 인해 적은 비용으로 제작이 가능하다는 이점 및 크로스바 어레이 구조에서 단위 셀 크기가 4F2로 제작이 가능하다. 여기서 F는 구현 가능한 최소 선폭을 나타낸다. 반면 DRAM, NAND, NOR 플래시메모리는 각각 6F2, 5F2, 10F2 의 단위 셀 크기를 갖고 있다. 즉, 멤리스터는 고집적 메모리 소자의 구현에 가장 적합한 소자라고 할 수 있다. 이러한 점에서 저항변화메모리는 기존의 NAND 플래시 메모리를 대체할 차세대 저장메모리로 주목받고 있다. NAND 플래시 메모리 또한 연구 개발이 꾸준히 이루어지고 있으며 수직소자의 개발로 인해 집적도가 크게 증가하였다. 하지만 현재의 수직 플래시 메모리의 경우 100단 이상의 개발에 성공하였지만 갈수록 공정 난이도가 올라가고 있는 추세이며 약 10년 내에 한계에 직면할 것으로 예상되고 있다. 동작 전압이 큰 플래시메모리의 특징으로 인해 수직소자 제작 과정에서 절연막의 두께가 두꺼워지게 되는데, 이는 제품 내 장착되는 메모리 칩의 최대 높이에 수직 소자가 도달하였을 때 더 이상 집적도를 향상시킬 수 없는 한계점으로 작용하게 된다. 저항변화 메모리는 낮은 동작 전압과 높은 집적도, 수직 소자로의 연구 개발 가능성 등으로 차세대 저장메모리로의 장점들을 많이 가지고 있다. 하지만 저항변화 메모리의 상용화 단계에서 가장 큰 문제점으로 작용하는 것은 바로 안정성 문제이다. 저항변화 메모리의 동작 원리 특성상 여러 개의 전도성 경로(conductive path)가 동시다발적으로 생기며, 이 경로들은 생성과 파열이 반복적으로 일어나며 동작하게 되는데, 그 과정에서 발생하는 동작 산포가 안정성에 영향을 주게 된다. 앞서 언급한 다양한 분야로의 연구 개발이 이루어지고 있지만 이러한 연구에 저항변화 메모리가 사용되기 위해서는 고집적 메모리의 개발 뿐 아니라 소자 내 반복 동작에서의 안정성 및 어레이에서 모든 소자들이 동일한 동작 특성을 보이는 소자간 동작 산포의 개선이 우선적으로 이루어져야 한다. 본 논문의 첫 번째 파트에서, 저항변화 메모리에서 가장 큰 문제점으로 지목되고 있는 반복 동작간 산포, 소자와 소자간 산포를 개선하기 위하여 Pt/Ta2O5/HfO2/TiN 소자 내 Au nanodots이 삽입되는 실험을 진행하였다. 이 소자는 HfO2 막내에 존재하는 shallow trap sites에 전자가 trapping/detrapping 하는 현상으로부터 trapping 되었을 때 낮은 저항 상태를, detrapping 되었을 때 높은 저항 상태를 보이는 저항변화 메모리 거동을 보인다. Ta2O5 박막을 증착 하는 과정에서 HfO2 박막에 가해지는 plasma로 인해 형성되는 deep trap sites들에 의해 안정적인 메모리 거동을 보이게 되는데, 해당 영역에 Au nanodots을 삽입함으로써 전계 집중 효과를 통하여 안정적인 메모리 거동을 보임을 확인하였다. Au nanodots이 삽입되지 않은 소자와 비교하였을 때 동작 산포가 극적으로 개선되는 결과를 확인하였으며, Au nanodots이 삽입되지 않은 소자는 ~200번 가량의 반복 동작이 가능한 반면, Au nanodots이 삽입된 소자의 경우 1000번 이상의 반복 동작에서도 동일하고 안정적인 메모리 거동을 보임을 확인하였다. 또한 해당 소자를 동작 시키는 과정에서, 컴플라이언스 전류(compliance current)를 조절함으로써 trap sites에 포획되는 전자의 양을 조절하고 이를 통하여 off 상태를 제외한 8개의 서로 겹치지 않는 전류 레벨을 확보함으로써 multi-level 동작 또한 가능함을 확인하였다. 본 논문의 두 번째 파트에서는, 삽입하는 Au nanodots의 위치에 따라 나타나는 소자의 전기적 동작 특성을 확인하고, COMSOL 시뮬레이션을 통해 전계집중 양상을 확인하였다. 기존 HfO2 박막 내 존재하는 다수의 trap sites에 의해 계면에서 스위칭이 일어난다고 알려져 있는 소자에 Au nanodots의 삽입 위치를 HfO2 박막과 Ta2O5 박막 내 삽입하였다. 단원자증착법으로 HfO2 박막을 일정 두께 증착하고 Au nanodots을 형성하여 준 후 다시 HfO2 박막을 증착하는 방법으로 HfO2 박막 내 Au nanodots을 삽입하였고 Ta2O5 박막 내에도 동일한 방법으로 Au nanodots을 삽입하였다. Ta2O5 의 경우 해당 소자에서 스위칭에는 관여를 하지 않으며 상부 전극으로 사용된 높은 일함수를 갖는 Pt 와 Schottky barrier를 형성하여 다이오드와 같은 특성을 보여주는 자가정류 특성에 기여한다고 알려져 있다. 따라서 Ta2O5 박막 내 Au nanodots이 삽입되었을 땐, 스위칭에는 영향을 주지 않을거라 예상되지만 이 또한 동작 반복성이 크게 향상되는 결과를 보여주었고, COMSOL 시뮬레이션을 통해 Au nanodots의 삽입 위치가 계면으로부터 멀어지게 되면 전계 집중 효과가 사라지게 되고 그와 동시에 동작 반복성의 개선 효과 또한 사라지는 것을 확인하였다. 이 결과를 통해 전계 집중 효과로 인해 동작 반복성이 개선되며, 계면에서 스위칭이 일어난다는 것을 실험적으로 증명할 수 있는 연구 결과이다. 본 논문의 세 번째 파트에서, Au nanodots 의 형성 과정을 기존 전면에 형성 하던 방법에서 전자빔 노광 방식을 통하여 국부적인 영역에 형성하는 연구를 진행하였다. nanodots을 형성하는 방법에는 다양한 방법들이 존재하는데 널리 알려져 있는 AAO, 구 형태의 나노 구조물들을 이용하는 방법들은 nanodots의 크기나 분포를 원하는 크기로 제작할 수 없다는 단점이 있다. 이러한 nanodots의 분포의 차이는 차후 소자 제작을 하였을 때 소자와 소자 간 사이 산포를 야기하는 요인으로 작용할 수 있으며 그 정도가 심해지게 되면 nanodots이 삽입되지 않는 소자도 존재하게 될 수 있다. 이러한 문제를 해결하기 위하여 본 연구에서는 전자빔 노광 방식을 통하여 원하는 위치에 원하는 크기로 Au nanodots을 형성하고자 하였다. 전자빔 노광을 진행한 후 Au 박막을 증착하고 lift-off 방식을 통하여 Au nanodots을 형성할 수 있었으며, 최소 50nm 크기로 형성할 수 있었으며 노광하는 과정에서 감광물질의 측면 기울기를 조절하기 위하여 서로 다른 분자량을 갖는 PMMA를 두 층으로 증착하여 분자량에 따른 민감성의 차이를 이용하여 확실한 undercut을 형성함으로써 lift-off 과정에서 Au 박막에 가해지는 물리적인 힘을 최소화 함으로써 작은 크기의 nanodots 또한 형성할 수 있었다. 또한 전자빔 노광 과정에서 가해지는 전자의 방사량을 조절하였다. 너무 적은 방사량은 감광 물질을 모두 반응 시키지 못하기 때문에 원하는 패턴을 형성할 수 없고, 너무 많은 방사량은 패턴을 넓어지게 만드는 요인으로 작용하게 되어 미세한 조절이 필요하게 된다. 이렇게 형성한 Au nanodots을 삽입하여 소자를 제작하고 원자 힘 현미경을 이용하여 표면 분석을 진행하였으며 nanodots이 삽입되어 있는 표면에서 눈에 띄게 높은 전류가 흐르는 것을 확인할 수 있었고, 이는 앞서 확인한 결과와 동일한 것으로 nanodots의 위치에 전계가 집중되는 것을 확실하게 보여주었으며, 이로 인해 동작 특성들이 개선되는 것을 알 수 있었다.The Memristor was firstly introduced by the professor Chua in 1971 and has been researched by many groups such as Hewlett-Packard (HP) since 2008. Resistive switching memory (ReRAM) has simple structure of metal-insulator-metal and has potential usage for recent ongoing topics of neuromorphic, synapse, and logic. Due to simple structure, it can be fabricated with low cost and has advantage of crossbar array with a unit cell of 4F2, where F means minimum feature size. Whereas, DRAM, NAND, NOR, and Flash Memory have 6F2, 5F2, 10F2, respectively. Since the memristor has the smallest unit cell among the other memory, it has a significant potential to replace NAND flash memory for high integration system. Although the recent technology of the vertical NAND flash memory increases the integration, it has a couple of limitations. First is its fabrication difficulty after layers of 100. Higher height also derives limitation of the high operation voltage in the Flash memory due to thicker insulating layer. On the other hand, ReRAM has many advantages over the flash memory such as low operation voltage, high integration, and potential compatibility of the vertical devices. Despite the advantages, it has a low reproducibility due to formation of the multiple conductive paths. These paths affect the variation of the operation voltage in the process of the formation and destruction of the paths. To address this problem, many researches have to be done not only for a high integration but most importantly for uniformity of the operation in the array. In the first part, insertion of Au nanodots in Pt/Ta2O5/HfO2/TiN was introduced to improve cell-to-cell variation and cyclic variation. The mechanism of the HfO2 was that electrons were trapped and detrapped in the shallow trap sites. When the electrons were trapped, it showed the low resistance state, whereas the high resistance for the detrapping state. In addition, when Ta2O5 was deposited on the HfO2, its plasma created the deep trap states, which acted as a conducting path. If Au nanodots were inserted in this layer, they assisted the conducting path and improved the memory switching because of the electric field concentration effect. The device without Au nanodots could exhibit around 200 cycles, but more than 1000 cycles could be done with the Au nanodots inserted. The Au nanodots inserted device was also capable of doing the multi-level operation by creating the stable 8 current level states under controls of the number of trapped electrons with compliance current. In the second part, electric switching operation based on the location of the inserted Au nanodots was addressed along with the COMSOL simulation tool for the electric field concentration. Two different locations, atomic layer deposited HfO2 and Ta2O5, were examined. Ta2O5 was well known for non-resistive switching layer and diode-like rectifying behavior from the Schottky barrier between high work function of Pt. Therefore, insertion of the Au nanodots might not affect this switching behavior. Switching behavior in Ta2O5, however, was improved after insertion of the Au nanodots. This unexpected behavior was confirmed through COMSOL simulation that if the location of the Au nanodots was sufficiently away from the interface, its improvement of the endurance was faded out along with the weaker field concentration effect. As a result, this experimentally confirms that the switching behavior was occurred at the interface. In the third part, fabrication of the Au nanodots in the localized area with electron beam (e-beam) deposition was addressed. There were many methods to deposit nanodots such as AAO, but those methods could not control the size or distribution of the nanodots since they used the circular shape nanostructure. The distribution of the nanodots is important factor because it could cause the cell-to-cell variation. To control the two factors, e-beam deposition was used. Au nanodots could be fabricated with these steps in order, e-beam exposure, deposition of the Au thin film and subsequent lift-off process. To achieve the fine size of the Au nanodots, reducing stress to the Au thin film and fine control of the e-beam power were important. Reducing stress could be achieved by controlling side slope of the photoresist (PR) in the exposure process. Two layers of PMMA with different molecular weight were deposited to create undercut slope PR, which reduced stress to the Au thin film. E-beam power was also important, which determined number of electrons emit to the PR layer. Too small of the power caused not enough reaction to create the pattern, whereas too high of the power caused broader pattern of the PR. Therefore, fine control of the power was necessary. As a result, the minimum size of 50 nm Au nanodots could be fabricated. After insertion of the Au nanodots, atomic force microscopy (AFM) was used to confirm locations of the conductive path on the surface. In the device, the conductive path showed in the nanodots, which confirmed successful induction of the electric field concentration. Therefore, this field concentration around the nanodots showed improvement in the switching properties.1. Introduction 1 1.1. Resistive switching Random Access Memory 1 1.2. Critical factor for a high-density array 4 1.3. Research scope and objective 6 2. Improvement of resistive switching uniformity by embedding Au nanodots in the Pt/Ta2O5/HfO2/TiN structure 7 2.1. Introduction 7 2.2. Experimental 12 2.3. Results and Discussions 14 2.4. Summary 36 3. Effect of electric field concentration depending on the location of Au nanodots in the device 37 3.1. Introduction 37 3.2. Experimental 40 3.3. Results and Discussions 42 3.4. Summary 57 4. Quantification of Au nanodots in the nanoscale devices 58 4.1. Introduction 58 4.2. Experimental 60 4.3. Results and Discussion 62 4.4. Summary 76 Conclusion 78 Biblography 82 List of publications 90 Abstract (in Korean) 101Docto

SNU Open Repository and Archive

Energy Measurements of High Performance Computing Systems: From Instrumentation to Analysis

Author: Ilsche Thomas
Publication venue
Publication date: 31/07/2020
Field of study

Energy efficiency is a major criterion for computing in general and High Performance Computing in particular. When optimizing for energy efficiency, it is essential to measure the underlying metric: energy consumption. To fully leverage energy measurements, their quality needs to be well-understood. To that end, this thesis provides a rigorous evaluation of various energy measurement techniques. I demonstrate how the deliberate selection of instrumentation points, sensors, and analog processing schemes can enhance the temporal and spatial resolution while preserving a well-known accuracy. Further, I evaluate a scalable energy measurement solution for production HPC systems and address its shortcomings. Such high-resolution and large-scale measurements present challenges regarding the management of large volumes of generated metric data. I address these challenges with a scalable infrastructure for collecting, storing, and analyzing metric data. With this infrastructure, I also introduce a novel persistent storage scheme for metric time series data, which allows efficient queries for aggregate timelines. To ensure that it satisfies the demanding requirements for scalable power measurements, I conduct an extensive performance evaluation and describe a productive deployment of the infrastructure. Finally, I describe different approaches and practical examples of analyses based on energy measurement data. In particular, I focus on the combination of energy measurements and application performance traces. However, interweaving fine-grained power recordings and application events requires accurately synchronized timestamps on both sides. To overcome this obstacle, I develop a resilient and automated technique for time synchronization, which utilizes crosscorrelation of a specifically influenced power measurement signal. Ultimately, this careful combination of sophisticated energy measurements and application performance traces yields a detailed insight into application and system energy efficiency at full-scale HPC systems and down to millisecond-range regions.:1 Introduction 2 Background and Related Work 2.1 Basic Concepts of Energy Measurements 2.1.1 Basics of Metrology 2.1.2 Measuring Voltage, Current, and Power 2.1.3 Measurement Signal Conditioning and Analog-to-Digital Conversion 2.2 Power Measurements for Computing Systems 2.2.1 Measuring Compute Nodes using External Power Meters 2.2.2 Custom Solutions for Measuring Compute Node Power 2.2.3 Measurement Solutions of System Integrators 2.2.4 CPU Energy Counters 2.2.5 Using Models to Determine Energy Consumption 2.3 Processing of Power Measurement Data 2.3.1 Time Series Databases 2.3.2 Data Center Monitoring Systems 2.4 Influences on the Energy Consumption of Computing Systems 2.4.1 Processor Power Consumption Breakdown 2.4.2 Energy-Efficient Hardware Configuration 2.5 HPC Performance and Energy Analysis 2.5.1 Performance Analysis Techniques 2.5.2 HPC Performance Analysis Tools 2.5.3 Combining Application and Power Measurements 2.6 Conclusion 3 Evaluating and Improving Energy Measurements 3.1 Description of the Systems Under Test 3.2 Instrumentation Points and Measurement Sensors 3.2.1 Analog Measurement at Voltage Regulators 3.2.2 Instrumentation with Hall Effect Transducers 3.2.3 Modular Instrumentation of DC Consumers 3.2.4 Optimal Wiring for Shunt-Based Measurements 3.2.5 Node-Level Instrumentation for HPC Systems 3.3 Analog Signal Conditioning and Analog-to-Digital Conversion 3.3.1 Signal Amplification 3.3.2 Analog Filtering and Analog-To-Digital Conversion 3.3.3 Integrated Solutions for High-Resolution Measurement 3.4 Accuracy Evaluation and Calibration 3.4.1 Synthetic Workloads for Evaluating Power Measurements 3.4.2 Improving and Evaluating the Accuracy of a Single-Node Measuring System 3.4.3 Absolute Accuracy Evaluation of a Many-Node Measuring System 3.5 Evaluating Temporal Granularity and Energy Correctness 3.5.1 Measurement Signal Bandwidth at Different Instrumentation Points 3.5.2 Retaining Energy Correctness During Digital Processing 3.6 Evaluating CPU Energy Counters 3.6.1 Energy Readouts with RAPL 3.6.2 Methodology 3.6.3 RAPL on Intel Sandy Bridge-EP 3.6.4 RAPL on Intel Haswell-EP and Skylake-SP 3.7 Conclusion 4 A Scalable Infrastructure for Processing Power Measurement Data 4.1 Requirements for Power Measurement Data Processing 4.2 Concepts and Implementation of Measurement Data Management 4.2.1 Message-Based Communication between Agents 4.2.2 Protocols 4.2.3 Application Programming Interfaces 4.2.4 Efficient Metric Time Series Storage and Retrieval 4.2.5 Hierarchical Timeline Aggregation 4.3 Performance Evaluation 4.3.1 Benchmark Hardware Specifications 4.3.2 Throughput in Symmetric Configuration with Replication 4.3.3 Throughput with Many Data Sources and Single Consumers 4.3.4 Temporary Storage in Message Queues 4.3.5 Persistent Metric Time Series Request Performance 4.3.6 Performance Comparison with Contemporary Time Series Storage Solutions 4.3.7 Practical Usage of MetricQ 4.4 Conclusion 5 Energy Efficiency Analysis 5.1 General Energy Efficiency Analysis Scenarios 5.1.1 Live Visualization of Power Measurements 5.1.2 Visualization of Long-Term Measurements 5.1.3 Integration in Application Performance Traces 5.1.4 Graphical Analysis of Application Power Traces 5.2 Correlating Power Measurements with Application Events 5.2.1 Challenges for Time Synchronization of Power Measurements 5.2.2 Reliable Automatic Time Synchronization with Correlation Sequences 5.2.3 Creating a Correlation Signal on a Power Measurement Channel 5.2.4 Processing the Correlation Signal and Measured Power Values 5.2.5 Common Oversampling of the Correlation Signals at Different Rates 5.2.6 Evaluation of Correlation and Time Synchronization 5.3 Use Cases for Application Power Traces 5.3.1 Analyzing Complex Power Anomalies 5.3.2 Quantifying C-State Transitions 5.3.3 Measuring the Dynamic Power Consumption of HPC Applications 5.4 Conclusion 6 Summary and Outloo

Technische Universität Dresden: Qucosa

Design for Test and Hardware Security Utilizing Tester Authentication Techniques

Author: OUAHAB Yahia
Publication venue: 'University of Windsor Leddy Library'
Publication date: 05/10/2017
Field of study

Design-for-Test (DFT) techniques have been developed to improve testability of integrated circuits. Among the known DFT techniques, scan-based testing is considered an efficient solution for digital circuits. However, scan architecture can be exploited to launch a side channel attack. Scan chains can be used to access a cryptographic core inside a system-on-chip to extract critical information such as a private encryption key. For a scan enabled chip, if an attacker is given unlimited access to apply all sorts of inputs to the Circuit-Under-Test (CUT) and observe the outputs, the probability of gaining access to critical information increases. In this thesis, solutions are presented to improve hardware security and protect them against attacks using scan architecture. A solution based on tester authentication is presented in which, the CUT requests the tester to provide a secret code for authentication. The tester authentication circuit limits the access to the scan architecture to known testers. Moreover, in the proposed solution the number of attempts to apply test vectors and observe the results through the scan architecture is limited to make brute-force attacks practically impossible. A tester authentication utilizing a Phase Locked Loop (PLL) to encrypt the operating frequency of both DUT/Tester has also been presented. In this method, the access to the critical security circuits such as crypto-cores are not granted in the test mode. Instead, a built-in self-test method is used in the test mode to protect the circuit against scan-based attacks. Security for new generation of three-dimensional (3D) integrated circuits has been investigated through 3D simulations COMSOL Multiphysics environment. It is shown that the process of wafer thinning for 3D stacked IC integration reduces the leakage current which increases the chip security against side-channel attacks

Scholarship at UWindsor