Search CORE

283 research outputs found

Understanding the propagation of hard errors to software and implications for resilient system design

Author: Chris
Daniel
David
Edward
Fred
Jayanth
Jun
Man-Lap Li
Milos
Nicholas
Pradeep Ramachandran
R. Rodriguez
Rajesh
Rotenberg Eric
Sarita V. Adve
Srinivasan M.
Swarup Kumar Sahoo
Todd
V. Reddy
Vikram S. Adve
Weining
Yuanyuan Zhou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Further Specialization of Clustered VLIW Processors: A MAP Decoder for Software Defined Radio

Author: Ituero Herrero Pablo
López Vallejo Marisa
Publication venue: 'Wiley'
Publication date: 01/01/2008
Field of study

Turbo codes are extensively used in current communications standards and have a promising outlook for future generations. The advantages of software defined radio, especially dynamic reconfiguration, make it very attractive in this multi-standard scenario. However, the complex and power consuming implementation of the maximum a posteriori (MAP) algorithm, employed by turbo decoders, sets hurdles to this goal. This work introduces an ASIP architecture for the MAP algorithm, based on a dual-clustered VLIW processor. It displays the good performance of application specific designs along with the versatility of processors, which makes it compliant with leading edge standards. The machine deals with multi-operand instructions in an innovative way, the fetching and assertion of data is serialized and the addressing is automatized and transparent for the programmer. The performance-area trade-off of the proposed architecture achieves a throughput of 8 cycles per symbol with very low power dissipation

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

From FPGA to ASIC: A RISC-V processor experience

Author: Rojas Morales Carlos
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Microarchitecture-level reliability assessment of multi-bit upsets in processors

Author: Gavanas Christos
Katsoridas Giorgos
Γαβάνας Χρήστος
Κατσορίδας Γιώργος
Publication venue
Publication date: 01/01/2019
Field of study

Η συνεχιζόμενη μείωση στις διαστάσεις των μοντέρνων Ολοκληρωμένων Κυκλωμάτων (Ο.Κ.) οδηγούν στον ολοένα και πιο σημαντικό ρόλο των αξιολογήσεων αξιοπιστίας και ευπάθειας στον επεξεργαστή, σε πρόωρα στάδια της σχεδίασης (pre-silicon validation). Με την εξέλιξη των τεχνολογικών κόμβων, τα αποτελέσματα της ακτινοβολίας παίζουν μεγαλύτερο ρόλο, οδηγώντας σε πιο σημαντικά αποτελέσματα στις συσκευές, με μια επιπρόσθετη αύξηση σε σφάλματα πολλαπλών bit. Συνεπώς, είναι καθοριστική η χρησιμοποίηση κάποιων κοινών μηχανισμών εισαγωγής σφαλμάτων για την αξιολόγηση κάθε σχεδίου, χρησιμοποιώντας προσομοιωτές μικρό-αρχιτεκτονικής, που μας παρέχουν ευελιξία και βελτιωμένη ταχύτητα, σε σύγκριση με τα σχέδια Επιπέδου Μεταφοράς Καταχωρητή. Αυτή η διπλωματική εργασία, εστιάζει στα σφάλματα πολλών bit, παρουσιάζοντας τα αποτελέσματα τους σε διαφορετικές δομές ενός μικρό-αρχιτεκτονικού μοντέλου του επεξεργαστή ARM Cortex-A9, που έχει υλοποιηθεί στον προσομοιωτή Gem5. Για αυτό τον λόγο χρησιμοποιείται για τις εκστρατείες εισαγωγής σφαλμάτων o GeFIN (Gem-5 based Fault INjector), με την προσθήκη μιας βελτιωμένης γεννήτριας σφαλμάτων, για τη δημιουργία μασκών σφαλμάτων με κάποια πολύ συγκεκριμένα χαρακτηριστικά. Η βελτιωμένη έκδοση της γεννήτριας, περιλαμβάνει την δυνατότητα για την εισαγωγή σφαλμάτων πολλών bit σε γειτονικές περιοχές κάθε δομής, μια πολύ συνηθισμένη περίπτωση σε πραγματικά περιβάλλοντα. Η γεννήτρια περιλαμβάνει επίσης της δυνατότητα για την εισαγωγή σφαλμάτων σε διεμπλεκόμενες (interleaved) μνήμες, ένας μηχανισμός που χρησιμοποιείται για το περιορισμό των αποτελεσμάτων των σφαλμάτων πολλών bit. Τα αποτελέσματα αυτής της διπλωματικής εργασίας, έδειξαν ότι κάποιες συγκεκριμένες δομές του επεξεργαστή-υπό-εξέταση (π.χ. ο Instruction Translation Lookaside Buffer) έδειξαν μεγάλη ευπάθεια στην εισαγωγή σφαλμάτων, με ποσοστά έως και 25% σωστών εκτελέσεων για 1000 πειράματα, ενώ άλλες δομές όπως οι Κρυφές Μνήμες Εντολών και Δεδομένων 1ου επιπέδου και η Κρυφή Μνήμη 2ου επιπέδου, έδειξαν μεγαλύτερη ευπάθεια στον αυξανόμενο αριθμό εισαγόμενων σφαλμάτων, με διακυμάνσεις μέχρι και 24% ανάμεσα στη εισαγωγή ενός και τριών σφαλμάτων στην κρυφή μνήμη 1ου επιπέδου. Αυτοί οι αριθμοί σχετιζόταν με τον θεωρητικό Architectural Vulnerability Factor (AVF) και ήταν ανεξάρτητοι από την τεχνολογία κατασκευής. Πραγματοποιήθηκε μια επέκταση στους υπολογισμούς για τον υπολογισμό των AVFs για κάθε τεχνολογικό κόμβο από 250 έως 22 nm, που έδειξε αυξημένα ποσοστά AVF όσο ο κόμβος μειωνόταν. Τέλος, πραγματοποιήθηκε μια ανάλυση αξιοπιστίας, χρησιμοποιώντας την μετρική Failures in Time (FIT), που έδειξε του υψηλότερους αριθμούς για την Κρυφή Μνήμη 2ου επιπέδου, κυρίως λόγω του μεγέθους της (4 MBits) με ένα FIT ίσο με 822.9 στα 130 nm. Ο FIT του επεξεργαστή είχε μέγιστο το 918 στον ίδιο κόμβο, ενώ παρατηρήσαμε ότι για κόμβους μικρότερους από 130 nm οι FIT μειώνονται, κυρίως επειδή υπάρχει μείωση στον παράγοντα raw FIT κάθε τεχνολογίας.The continuing decrease in feature sizes for modern Integrated Circuits (ICs) leads to an ever-important role of reliability and vulnerability assessments on the core in early stages of the design (pre-silicon validation). With the increase of the lithography resolution in recent technological nodes, the radiation effects play a bigger role, leading to more severe effects in the devices and increased numbers of multi-bit faults. Therefore, it is crucial to utilize some common fault injection mechanisms to evaluate each design, using micro-architectural simulators, which provide us with flexibility and improved latency, compared to RTL (Register Transfer Level) designs. This thesis focuses on the multi-bit faults, showing their effects on different components of a microarchitectural model of the ARM Cortex-A9 core, implemented on the Gem5 simulator. For that, the GeFIN (Gem-5 based Fault INjector) is used for the fault injection campaigns, with the addition of an improved fault mask generation tool for the creation of fault masks with some particular characteristics. The improved version of the fault mask generator includes the capability for the injection of multi-bit faults in adjacent areas of a structure, a case very common in real environments. The generator also includes the ability to insert faults in interleaved memories, a widely used technique to mitigate the effects of multiple bit upsets. The results of this study showed that some specific components of the core under test (e.g. the Instruction Translation Lookaside Buffer) showed significant vulnerability to fault injection, with rates as low as 25% correct executions for 1000 experiments, while others like the Level 1 Data/Instruction Caches and the Level 2 Cache showed bigger vulnerability to the increasing number of faults injected, with a variation of as high as 24% between single and triple bit fault injection for the L1 D-Cache. Those numbers were related to the “theoretical” Architectural Vulnerability Factor (AVF), independent of the fabrication technology node. An extension in the calculation was done to compute the AVFs for each technology node from 250 nm to 22 nm, showing increasing AVF rates as the node decreases. Lastly, a reliability assessment was done, using the Failures in Time (FIT) metric, which showed the highest numbers for the Level 2 Cache, primarily because of its size (4 MBits) with a FIT of 822.9 at the 130 nm. The FIT of the core showed a high of 918 at the same node, while we observed that for nodes smaller than 130 nm the FITs decreased primarily because of the decrease of the raw FIT factor of each technology

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Modular Verification of Interrupt-Driven Software

Author: Kusano Markus
Sung Chungha
Wang Chao
Publication venue
Publication date: 28/09/2017
Field of study

Interrupts have been widely used in safety-critical computer systems to handle outside stimuli and interact with the hardware, but reasoning about interrupt-driven software remains a difficult task. Although a number of static verification techniques have been proposed for interrupt-driven software, they often rely on constructing a monolithic verification model. Furthermore, they do not precisely capture the complete execution semantics of interrupts such as nested invocations of interrupt handlers. To overcome these limitations, we propose an abstract interpretation framework for static verification of interrupt-driven software that first analyzes each interrupt handler in isolation as if it were a sequential program, and then propagates the result to other interrupt handlers. This iterative process continues until results from all interrupt handlers reach a fixed point. Since our method never constructs the global model, it avoids the up-front blowup in model construction that hampers existing, non-modular, verification techniques. We have evaluated our method on 35 interrupt-driven applications with a total of 22,541 lines of code. Our results show the method is able to quickly and more accurately analyze the behavior of interrupts.Comment: preprint of the ASE 2017 pape

arXiv.org e-Print Archive

Crossref

Compiler-Aided Methodology for Low Overhead On-line Testing

Author: Gaydadjiev Georgi N.
Nazarian G.
Seepers R. M.
Strydis C.
Publication venue
Publication date: 01/01/2013
Field of study

Reliability is emerging as an important design criterion in modern systems due to increasing transient fault rates. Hardware fault-tolerance techniques, commonly used to address this, introduce high design costs. As alternative, software Signature-Monitoring (SM) schemes based on compiler assertions are an efficient method for control-flow-error detection. Existing SM techniques do not consider application-specific-information causing unnecessary overheads. In this paper, compile-time Control-Flow-Graph (CFG) topology analysis is used to place best-suited assertions at optimal locations of the assembly code to reduce overheads. Our evaluation with representative workloads shows fault-coverage increase with overheads close to Assertion- based Control-Flow Correction (ACFC), the method with lowest overhead. Compared to ACFC, our technique improves (on average) fault coverage by 17%, performance overhead by 5% and power-consumption by 3% with equal code-size overhead

EUR Research Repository

Chalmers Research

Chalmers Publication Library

Recommended from our members

Network-on-Chip Synchronization

Author: Buckler Mark
Publication venue: ScholarWorks@UMass Amherst
Publication date: 07/11/2014
Field of study

Technology scaling has enabled the number of cores within a System on Chip (SoC) to increase significantly. Globally Asynchronous Locally Synchronous (GALS) systems using Dynamic Voltage and Frequency Scaling (DVFS) operate each of these cores on distinct and dynamic clock domains. The main communication method between these cores is increasingly more likely to be a Network-on-Chip (NoC). Typically, the interfaces between these clock domains experience multi-cycle synchronization latencies due to their use of “brute-force” synchronizers. This dissertation aims to improve the performance of NoCs and thereby SoCs as a whole by reducing this synchronization latency. First, a survey of NoC improvement techniques is presented. One such improvement technique: a multi-layer NoC, has been successfully simulated. Given how one of the most commonly used techniques is DVFS, a thorough analysis and simulation of brute-force synchronizer circuits in both current and future process technologies is presented. Unfortunately, a multi-cycle latency is unavoidable when using brute-force synchronizers, so predictive synchronizers which require only a single cycle of latency have been proposed. To demonstrate the impact of these predictive synchronizer circuits at a high level, multi-core system simulations incorporating these circuits have been completed. Multiple forms of GALS NoC configurations have been simulated, including multi-synchronous, NoC-synchronous, and single-synchronizer. Speedup on the SPLASH benchmark suite was measured to directly quantify the performance benefit of predictive synchronizers in a full system. Additionally, Mean Time Between Failures (MTBF) has been calculated for each NoC synchronizer configuration to determine the reliability benefit possible when using predictive synchronizers

ScholarWorks@UMass Amherst

Recommended from our members

Quantum Vulnerability Analysis to Guide Robust Quantum Computing System Design

Author: Chong Frederic T.
LeCompte Travis
Lu Peng
Qi Fang
Smith Kaitlin N.
Tzeng Nian-feng
Yuan Xu
Publication venue
Publication date: 28/01/2024
Field of study

While quantum computers provide exciting opportunities for information processing, they currently suffer from noise during computation that is not fully understood. Incomplete noise models have led to discrepancies between quantum program success rate (SR) estimates and actual machine outcomes. For example, the estimated probability of success (ESP) is the state-of-the-art metric used to gauge quantum program performance. The ESP suffers poor prediction since it fails to account for the unique combination of circuit structure, quantum state, and quantum computer properties specific to each program execution. Thus, an urgent need exists for a systematic approach that can elucidate various noise impacts and accurately and robustly predict quantum computer success rates, emphasizing application and device scaling. In this article, we propose quantum vulnerability analysis (QVA) to systematically quantify the error impact on quantum applications and address the gap between current success rate (SR) estimators and real quantum computer results. The QVA determines the cumulative quantum vulnerability (CQV) of the target quantum computation, which quantifies the quantum error impact based on the entire algorithm applied to the target quantum machine. By evaluating the CQV with well-known benchmarks on three 27-qubit quantum computers, the CQV success estimation outperforms the estimated probability of success state-of-the-art prediction technique by achieving on average six times less relative prediction error, with best cases at 30 times, for benchmarks with a real SR rate above 0.1%. Direct application of QVA has been provided that helps researchers choose a promising compiling strategy at compile time

Knowledge UChicago

Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems

Author: Subramanian Viswanathan
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2009
Field of study

Computers have changed our lives beyond our own imagination in the past several decades. The continued and progressive advancements in VLSI technology and numerous micro-architectural innovations have played a key role in the design of spectacular low-cost high performance computing systems that have become omnipresent in today\u27s technology driven world. Performance and dependability have become key concerns as these ubiquitous computing machines continue to drive our everyday life. Every application has unique demands, as they run in diverse operating environments. Dependable, aggressive and adaptive systems improve efficiency in terms of speed, reliability and energy consumption. Traditional computing systems run at a fixed clock frequency, which is determined by taking into account the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable overclocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. The success of this design methodology relies on the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case design methodology. Better-than-worst-case design methodology is advocated by several recent research pursuits, which exploit dependability techniques to enhance computer system performance. In this dissertation, we address different aspects of timing speculation based adaptive reliable overclocking schemes, and evaluate their role in the design of low-cost, high performance, energy efficient and dependable systems. We visualize various control knobs in the design that can be favorably controlled to ensure different design targets. As part of this research, we extend the SPRIT3E, or Superscalar PeRformance Improvement Through Tolerating Timing Errors, framework, and characterize the extent of application dependent performance acceleration achievable in superscalar processors by scrutinizing the various parameters that impact the operation beyond worst-case limits. We study the limitations imposed by short-path constraints on our technique, and present ways to exploit them to maximize performance gains. We analyze the sensitivity of our technique\u27s adaptiveness by exploring the necessary hardware requirements for dynamic overclocking schemes. Experimental analysis based on SPEC2000 benchmarks running on a SimpleScalar Alpha processor simulator, augmented with error rate data obtained from hardware simulations of a superscalar processor, are presented. Even though reliable overclocking guarantees functional correctness, it leads to higher power consumption. As a consequence, reliable overclocking without considering on-chip temperatures will bring down the lifetime reliability of the chip. In this thesis, we analyze how reliable overclocking impacts the on-chip temperature of a microprocessor and evaluate the effects of overheating, due to such reliable dynamic frequency tuning mechanisms, on the lifetime reliability of these systems. We then evaluate the effect of performing thermal throttling, a technique that clamps the on-chip temperature below a predefined value, on system performance and reliability. Our study shows that a reliably overclocked system with dynamic thermal management achieves 25% performance improvement, while lasting for 14 years when being operated within 353K. Over the past five decades, technology scaling, as predicted by Moore\u27s law, has been the bedrock of semiconductor technology evolution. The continued downscaling of CMOS technology to deep sub-micron gate lengths has been the primary reason for its dominance in today\u27s omnipresent silicon microchips. Even as the transition to the next technology node is indispensable, the initial cost and time associated in doing so presents a non-level playing field for the competitors in the semiconductor business. As part of this thesis, we evaluate the capability of speculative reliable overclocking mechanisms to maximize performance at a given technology level. We evaluate its competitiveness when compared to technology scaling, in terms of performance, power consumption, energy and energy delay product. We present a comprehensive comparison for integer and floating point SPEC2000 benchmarks running on a simulated Alpha processor at three different technology nodes in normal and enhanced modes. Our results suggest that adopting reliable overclocking strategies will help skip a technology node altogether, or be competitive in the market, while porting to the next technology node. Reliability has become a serious concern as systems embrace nanometer technologies. In this dissertation, we propose a novel fault tolerant aggressive system that combines soft error protection and timing error tolerance. We replicate both the pipeline registers and the pipeline stage combinational logic. The replicated logic receives its inputs from the primary pipeline registers while writing its output to the replicated pipeline registers. The organization of redundancy in the proposed Conjoined Pipeline system supports overclocking, provides concurrent error detection and recovery capability for soft errors, intermittent faults and timing errors, and flags permanent silicon defects. The fast recovery process requires no checkpointing and takes three cycles. Back annotated post-layout gate-level timing simulations, using 45nm technology, of a conjoined two-stage arithmetic pipeline and a conjoined five-stage DLX pipeline processor, with forwarding logic, show that our approach, even under a severe fault injection campaign, achieves near 100% fault coverage and an average performance improvement of about 20%, when dynamically overclocked

Digital Repository @ Iowa State University (ISU)

상변화 메모리 시스템의 간섭 오류 완화 및 RMW 성능 향상 기법

Author: 이효근
Publication venue: 서울대학교 대학원
Publication date: 01/01/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2021.8. 이혁재.Phase-change memory (PCM) announces the beginning of the new era of memory systems, owing to attractive characteristics. Many memory product manufacturers (e.g., Intel, SK Hynix, and Samsung) are developing related products. PCM can be applied to various circumstances; it is not simply limited to an extra-scale database. For example, PCM has a low standby power due to its non-volatility; hence, computation-intensive applications or mobile applications (i.e., long memory idle time) are suitable to run on PCM-based computing systems. Despite these fascinating features of PCM, PCM is still far from the general commercial market due to low reliability and long latency problems. In particular, low reliability is a painful problem for PCM in past decades. As the semiconductor process technology rapidly scales down over the years, DRAM reaches 10 nm class process technology. In addition, it is reported that the write disturbance error (WDE) would be a serious issue for PCM if it scales down below 54 nm class process technology. Therefore, addressing the problem of WDEs becomes essential to make PCM competitive to DRAM. To overcome this problem, this dissertation proposes a novel approach that can restore meta-stable cells on demand by levering two-level SRAM-based tables, thereby significantly reducing the number WDEs. Furthermore, a novel randomized approach is proposed to implement a replacement policy that originally requires hundreds of read ports on SRAM. The second problem of PCM is a long-latency compared to that of DRAM. In particular, PCM tries to enhance its throughput by adopting a larger transaction unit; however, the different unit size from the general-purpose processor cache line further degrades the system performance due to the introduction of a read-modify-write (RMW) module. Since there has never been any research related to RMW in a PCM-based memory system, this dissertation proposes a novel architecture to enhance the overall system performance and reliability of a PCM-based memory system having an RMW module. The proposed architecture enhances data re-usability without introducing extra storage resources. Furthermore, a novel operation that merges commands regardless of command types is proposed to enhance performance notably. Another problem is the absence of a full simulation platform for PCM. While the announced features of the PCM-related product (i.e., Intel Optane) are scarce due to confidential issues, all priceless information can be integrated to develop an architecture simulator that resembles the available product. To this end, this dissertation tries to scrape up all available features of modules in a PCM controller and implement a dedicated simulator for future research purposes.상변화 메모리는(PCM) 매력적인 특성을 통해 메모리 시스템의 새로운 시대의 시작을 알렸다. 많은 메모리 관련 제품 제조업체(예 : 인텔, SK 하이닉스, 삼성)가 관련 제품 개발에 박차를 가하고 있다. PCM은 단순히 대규모 데이터베이스에만 국한되지 않고 다양한 상황에 적용될 수 있다. 예를 들어, PCM은 비휘발성으로 인해 대기 전력이 낮다. 따라서 계산 집약적인 애플리케이션 또는 모바일 애플리케이션은(즉, 긴 메모리 유휴 시간) PCM 기반 컴퓨팅 시스템에서 실행하기에 적합하다. PCM의 이러한 매력적인 특성에도 불구하고 PCM은 낮은 신뢰성과 긴 대기 시간으로 인해 여전히 일반 산업 시장에서는 DRAM과 다소 격차가 있다. 특히 낮은 신뢰성은 지난 수십 년 동안 PCM 기술의 발전을 저해하는 문제다. 반도체 공정 기술이 수년에 걸쳐 빠르게 축소됨에 따라 DRAM은 10nm 급 공정 기술에 도달하였다. 이어서, 쓰기 방해 오류 (WDE)가 54nm 등급 프로세스 기술 아래로 축소되면 PCM에 심각한 문제가 될 것으로 보고되었다. 따라서, WDE 문제를 해결하는 것은 PCM이 DRAM과 동등한 경쟁력을 갖추도록 하는 데 있어 필수적이다. 이 문제를 극복하기 위해 이 논문에서는 2-레벨 SRAM 기반 테이블을 활용하여 WDE 수를 크게 줄여 필요에 따라 준 안정 셀을 복원할 수 있는 새로운 접근 방식을 제안한다. 또한, 원래 SRAM에서 수백 개의 읽기 포트가 필요한 대체 정책을 구현하기 위해 새로운 랜덤 기반의 기법을 제안한다. PCM의 두 번째 문제는 DRAM에 비해 지연 시간이 길다는 것이다. 특히 PCM은 더 큰 트랜잭션 단위를 채택하여 단위시간 당 데이터 처리량 향상을 도모한다. 그러나 범용 프로세서 캐시 라인과 다른 유닛 크기는 읽기-수정-쓰기 (RMW) 모듈의 도입으로 인해 시스템 성능을 저하하게 된다. PCM 기반 메모리 시스템에서 RMW 관련 연구가 없었기 때문에 본 논문은 RMW 모듈을 탑재 한 PCM 기반 메모리 시스템의 전반적인 시스템 성능과 신뢰성을 향상하게 시킬 수 있는 새로운 아키텍처를 제안한다. 제안된 아키텍처는 추가 스토리지 리소스를 도입하지 않고도 데이터 재사용성을 향상시킨다. 또한, 성능 향상을 위해 명령 유형과 관계없이 명령을 병합하는 새로운 작업을 제안한다. 또 다른 문제는 PCM을 위한 완전한 시뮬레이션 플랫폼이 부재하다는 것이다. PCM 관련 제품(예 : Intel Optane)에 대해 발표된 정보는 대외비 문제로 인해 부족하다. 하지만 알려져 있는 정보를 적절히 취합하면 시중 제품과 유사한 아키텍처 시뮬레이터를 개발할 수 있다. 이를 위해 본 논문은 PCM 메모리 컨트롤러에 필요한 모든 모듈 정보를 활용하여 향후 이와 관련된 연구에서 충분히 사용 가능한 전용 시뮬레이터를 구현하였다.1 INTRODUCTION 1 1.1 Limitation of Traditional Main Memory Systems 1 1.2 Phase-Change Memory as Main Memory 3 1.2.1 Opportunities of PCM-based System 3 1.2.2 Challenges of PCM-based System 4 1.3 Dissertation Overview 7 2 BACKGROUND AND PREVIOUS WORK 8 2.1 Phase-Change Memory 8 2.2 Mitigation Schemes for Write Disturbance Errors 10 2.2.1 Write Disturbance Errors 10 2.2.2 Verification and Correction 12 2.2.3 Lazy Correction 13 2.2.4 Data Encoding-based Schemes 14 2.2.5 Sparse-Insertion Write Cache 16 2.3 Performance Enhancement for Read-Modify-Write 17 2.3.1 Traditional Read-Modify-Write 17 2.3.2 Write Coalescing for RMW 19 2.4 Architecture Simulators for PCM 21 2.4.1 NVMain 21 2.4.2 Ramulator 22 2.4.3 DRAMsim3 22 3 IN-MODULE DISTURBANCE BARRIER 24 3.1 Motivation 25 3.2 IMDB: In Module-Disturbance Barrier 29 3.2.1 Architectural Overview 29 3.2.2 Implementation of Data Structures 30 3.2.3 Modification of Media Controller 36 3.3 Replacement Policy 38 3.3.1 Replacement Policy for IMDB 38 3.3.2 Approximate Lowest Number Estimator 40 3.4 Putting All Together: Case Studies 43 3.5 Evaluation 45 3.5.1 Configuration 45 3.5.2 Architectural Exploration 47 3.5.3 Effectiveness of the Replacement Policy 48 3.5.4 Sensitivity to Main Table Configuration 49 3.5.5 Sensitivity to Barrier Buffer Size 51 3.5.6 Sensitivity to AppLE Group Size 52 3.5.7 Comparison with Other Studies 54 3.6 Discussion 59 3.7 Summary 63 4 INTEGRATION OF AN RMW MODULE IN A PCM-BASED SYSTEM 64 4.1 Motivation 65 4.2 Utilization of DRAM Cache for RMW 67 4.2.1 Architectural Design 67 4.2.2 Algorithm 70 4.3 Typeless Command Merging 73 4.3.1 Architectural Design 73 4.3.2 Algorithm 74 4.4 An Alternative Implementation: SRC-RMW 78 4.4.1 Implementation of SRC-RMW 78 4.4.2 Design Constraint 80 4.5 Case Study 82 4.6 Evaluation 85 4.6.1 Configuration 85 4.6.2 Speedup 88 4.6.3 Read Reliability 91 4.6.4 Energy Consumption: Selecting a Proper Page Size 93 4.6.5 Comparison with Other Studies 95 4.7 Discussion 97 4.8 Summary 99 5 AN ALL-INCLUSIVE SIMULATOR FOR A PCM CONTROLLER 100 5.1 Motivation 101 5.2 PCMCsim: PCM Controller Simulator 103 5.2.1 Architectural Overview 103 5.2.2 Underlying Classes of PCMCsim 104 5.2.3 Implementation of Contention Behavior 108 5.2.4 Modules of PCMCsim 109 5.3 Evaluation 116 5.3.1 Correctness of the Simulator 116 5.3.2 Comparison with Other Simulators 117 5.4 Summary 119 6 Conclusion 120 Abstract (In Korean) 141 Acknowledgment 143박

SNU Open Repository and Archive