11,354 research outputs found

    Glider: A GPU Library Driver for Improved System Security

    Full text link
    Legacy device drivers implement both device resource management and isolation. This results in a large code base with a wide high-level interface making the driver vulnerable to security attacks. This is particularly problematic for increasingly popular accelerators like GPUs that have large, complex drivers. We solve this problem with library drivers, a new driver architecture. A library driver implements resource management as an untrusted library in the application process address space, and implements isolation as a kernel module that is smaller and has a narrower lower-level interface (i.e., closer to hardware) than a legacy driver. We articulate a set of device and platform hardware properties that are required to retrofit a legacy driver into a library driver. To demonstrate the feasibility and superiority of library drivers, we present Glider, a library driver implementation for two GPUs of popular brands, Radeon and Intel. Glider reduces the TCB size and attack surface by about 35% and 84% respectively for a Radeon HD 6450 GPU and by about 38% and 90% respectively for an Intel Ivy Bridge GPU. Moreover, it incurs no performance cost. Indeed, Glider outperforms a legacy driver for applications requiring intensive interactions with the device driver, such as applications using the OpenGL immediate mode API

    Reliable Software for Unreliable Hardware - A Cross-Layer Approach

    Get PDF
    A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers

    MGSim - Simulation tools for multi-core processor architectures

    Get PDF
    MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on many-core and hardware multithreaded processors. It includes support for core models with different instruction sets, a configurable multi-core interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a many-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.Comment: 33 pages, 22 figures, 4 listings, 2 table

    C-MOS array design techniques: SUMC multiprocessor system study

    Get PDF
    The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units

    System configuration and executive requirements specifications for reusable shuttle and space station/base

    Get PDF
    System configuration and executive requirements specifications for reusable shuttle and space station/bas

    GPU ์—๋Ÿฌ ์•ˆ์ •์„ฑ ๋ณด์žฅ์„ ์œ„ํ•œ ์ปดํŒŒ์ผ๋Ÿฌ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์ด์žฌ์ง„.Due to semiconductor technology scaling and near-threshold voltage computing, soft error resilience has become more important. Nowadays, GPUs are widely used in high performance computing (HPC) because of its efficient parallel processing and modern GPUs designed for HPC use error correction code (ECC) to protect their storage including register files. However, adopting ECC in the register file imposes high area and energy overhead. To replace the expensive hardware cost of ECC, we propose Penny, a lightweight compiler-directed resilience scheme for GPU register file protection. We combine recent advances in idempotent recovery with low-cost error detection code. Our approach focuses on solving two important problems: 1. Can we guarantee correct error recovery using idempotent execution with error detection code? We show that when an error detection code is used with idempotence recovery, certain restrictions required by previous idempotent recovery schemes are no longer needed. We also propose a software-based scheme to prevent the checkpoint value from being overwritten before the end of the region where the value is required for correct recovery. 2. How do we reduce the execution overhead caused by checkpointing? In GPUs additional checkpointing store instructions inflicts considerably higher overhead compared to CPUs, due to its architectural characteristics, such as lack of store buffers. We propose a number of compiler optimizations techniques that significantly reduce the overhead.๋ฐ˜๋„์ฒด ๋ฏธ์„ธ๊ณต์ • ๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•˜๊ณ  ๋ฌธํ„ฑ์ „์•• ๊ทผ์ฒ˜ ์ปดํ“จํŒ…(near-threashold voltage computing)์ด ๋„์ž…๋จ์— ๋”ฐ๋ผ์„œ ์†Œํ”„ํŠธ ์—๋Ÿฌ๋กœ๋ถ€ํ„ฐ์˜ ๋ณต์›์ด ์ค‘์š”ํ•œ ๊ณผ์ œ๊ฐ€ ๋˜์—ˆ๋‹ค. ๊ฐ•๋ ฅํ•œ ๋ณ‘๋ ฌ ๊ณ„์‚ฐ ์„ฑ๋Šฅ์„ ์ง€๋‹Œ GPU๋Š” ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ…์—์„œ ์ค‘์š”ํ•œ ์œ„์น˜๋ฅผ ์ฐจ์ง€ํ•˜๊ฒŒ ๋˜์—ˆ๊ณ , ์Šˆํผ ์ปดํ“จํ„ฐ์—์„œ ์“ฐ์ด๋Š” GPU๋“ค์€ ์—๋Ÿฌ ๋ณต์› ์ฝ”๋“œ์ธ ECC๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ง€์Šคํ„ฐ ํŒŒ์ผ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๋“ฑ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณดํ˜ธํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ ˆ์ง€์Šคํ„ฐ ํŒŒ์ผ์— ECC๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ํฐ ํ•˜๋“œ์›จ์–ด๋‚˜ ์—๋„ˆ์ง€ ๋น„์šฉ์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ์ด๋Ÿฐ ๊ฐ’๋น„์‹ผ ECC์˜ ํ•˜๋“œ์›จ์–ด ๋น„์šฉ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ปดํŒŒ์ผ๋Ÿฌ ๊ธฐ๋ฐ˜์˜ ์ €๋น„์šฉ GPU ๋ ˆ์ง€์Šคํ„ฐ ํŒŒ์ผ ๋ณต์› ๊ธฐ๋ฒ•์ธ Penny๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Š” ์ตœ์‹ ์˜ ๋ฉฑ๋“ฑ์„ฑ(idempotency) ๊ธฐ๋ฐ˜ ์—๋Ÿฌ ๋ณต์› ๊ธฐ๋ฒ•์„ ์ €๋น„์šฉ์˜ ์—๋Ÿฌ ๊ฒ€์ถœ ์ฝ”๋“œ(EDC)์™€ ๊ฒฐํ•ฉํ•œ ๊ฒƒ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋‹ค์Œ ๋‘๊ฐ€์ง€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ์— ์ง‘์ค‘ํ•œ๋‹ค. 1. ์—๋Ÿฌ ๊ฒ€์ถœ ์ฝ”๋“œ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฉฑ๋“ฑ์„ฑ ๊ธฐ๋ฐ˜ ์—๋Ÿฌ ๋ณต์›์„ ์‚ฌ์šฉ์‹œ ์†Œํ”„ํŠธ ์—๋Ÿฌ๋กœ๋ถ€ํ„ฐ์˜ ์•ˆ์ „ํ•œ ๋ณต์›์„ ๋ณด์žฅํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?} ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—๋Ÿฌ ๊ฒ€์ถœ ์ฝ”๋“œ๊ฐ€ ๋ฉฑ๋“ฑ์„ฑ ๊ธฐ๋ฐ˜ ๋ณต์› ๊ธฐ์ˆ ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉ๋˜์—ˆ์„ ๊ฒฝ์šฐ ๊ธฐ์กด์˜ ๋ณต์› ๊ธฐ๋ฒ•์—์„œ ํ•„์š”๋กœ ํ–ˆ๋˜ ์กฐ๊ฑด๋“ค ์—†์ด๋„ ์•ˆ์ „ํ•˜๊ฒŒ ์—๋Ÿฌ๋กœ๋ถ€ํ„ฐ ๋ณต์›ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. 2. ์ฒดํฌํฌ์ธํŒ…์—๋“œ๋Š” ๋น„์šฉ์„ ์–ด๋–ป๊ฒŒ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?} GPU๋Š” ์Šคํ† ์–ด ๋ฒ„ํผ๊ฐ€ ์—†๋Š” ๋“ฑ ์•„ํ‚คํ…์ณ์ ์ธ ํŠน์„ฑ์œผ๋กœ ์ธํ•ด์„œ CPU์™€ ๋น„๊ตํ•˜์—ฌ ์ฒดํฌํฌ์ธํŠธ ๊ฐ’์„ ์ €์žฅํ•˜๋Š” ๋ฐ์— ํฐ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋“ ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์ปดํŒŒ์ผ๋Ÿฌ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•˜์—ฌ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ธ๋‹ค.1 Introduction 1 1.1 Why is Soft Error Resilience Important in GPUs 1 1.2 How can the ECC Overhead be Reduced 3 1.3 What are the Challenges 4 1.4 How do We Solve the Challenges 5 2 Comparison of Error Detection and Correction Coding Schemes for Register File Protection 7 2.1 Error Correction Codes and Error Detection Codes 8 2.2 Cost of Coding Schemes 9 2.3 Soft Error Frequency of GPUs 11 3 Idempotent Recovery and Challenges 13 3.1 Idempotent Execution 13 3.2 Previous Idempotent Schemes 13 3.2.1 De Kruijf's Idempotent Translation 14 3.2.2 Bolts's Idempotent Recovery 15 3.2.3 Comparison between Idempotent Schemes 15 3.3 Idempotent Recovery Process 17 3.4 Idempotent Recovery Challenges for GPUs 18 3.4.1 Checkpoint Overwriting 20 3.4.2 Performance Overhead 20 4 Correctness of Recovery 22 4.1 Proof of Safe Recovery 23 4.1.1 Prevention of Error Propagation 23 4.1.2 Proof of Correct State Recovery 24 4.1.3 Correctness in Multi-Threaded Execution 28 4.2 Preventing Checkpoint Overwriting 30 4.2.1 Register renaming 31 4.2.2 Storage Alternation by Checkpoint Coloring 33 4.2.3 Automatic Algorithm Selection 38 4.2.4 Future Works 38 5 Performance Optimizations 40 5.1 Compilation Phases of Penny 40 5.1.1 Region Formation 41 5.1.2 Bimodal Checkpoint Placement 41 5.1.3 Storage Alternation 42 5.1.4 Checkpoint Pruning 43 5.1.5 Storage Assignment 44 5.1.6 Code Generation and Low-level Optimizations 45 5.2 Cost Estimation Model 45 5.3 Region Formation 46 5.3.1 De Kruijf's Heuristic Region Formation 46 5.3.2 Region splitting and Region Stitching 47 5.3.3 Checkpoint-Cost Aware Optimal Region Formation 48 5.4 Bimodal Checkpoint Placement 52 5.5 Optimal Checkpoint Pruning 55 5.5.1 Bolt's Naive Pruning Algorithm and Overview of Penny's Optimal Pruning Algorithm 55 5.5.2 Phase 1: Collecting Global-Decision Independent Status 56 5.5.3 Phase2: Ordering and Finalizing Renaming Decisions 60 5.5.4 Effectiveness of Eliminating the Checkpoints 63 5.6 Automatic Checkpoint Storage Assignment 69 5.7 Low-Level Optimizations and Code Generation 70 6 Evaluation 74 6.1 Test Environment 74 6.1.1 GPU Architecture and Simulation Setup 74 6.1.2 Tested Applications 75 6.1.3 Register Assignment 76 6.2 Performance Evaluation 77 6.2.1 Overall Performance Overheads 77 6.2.2 Impact of Penny's Optimizations 78 6.2.3 Assigning Checkpoint Storage and Its Integrity 79 6.2.4 Impact of Optimal Checkpoint Pruning 80 6.2.5 Impact of Alias Analysis 81 6.3 Repurposing the Saved ECC Area 82 6.4 Energy Impact on Execution 83 6.5 Performance Overhead on Volta Architecture 85 6.6 Compilation Time 85 7 RelatedWorks 87 8 Conclusion and Future Works 89 8.1 Limitation and Future Work 90Docto

    Implementing a Time Management Unit for the OR1200 Processor.

    Get PDF
    This thesis presents a Time Management Unit (TMU) that provides assistance to the scheduler and the interrupt handling of a real-time operation system. The unit provides functionality for monitoring task execution time and a mechanism for signalling when a task depletes its resources. This applies to both regular tasks and the handling of sporadic events. By putting the TMU inside a processor core, it has a more predictable impact on the overhead related to task switching. The implemented TMU is tested as a stand-alone unit with a hardware testbench, and then integrated into the OpenRISC 1000 based OR1200 processor as special purpose registers. The behaviour of OR1200 is verified through hardware simulation, using compiled software as input. The Or1ksim instruction-set simulator is modified to include the TMU functionality, which provides a reference point for the behaviour of the altered processor. The real-time operating system FreeRTOS is adapted to utilize the functionality of the TMU. Its behaviour is verified through simulation on Or1ksim, simulated hardware and execution on a Cyclone V FPGA. Analysis of runtime statistics shows that the module is working as expected through all phases of verification, and that it can increase determinism, reliability and user control. Tests have shown that the TMU is able to recover a faulting task from spin-locks and aid in fail-soft operations for software faults. By placing the TMU inside the processor core, a fixed overhead of 131 cycles is achieved during a context switch when no caches are used

    Analyzing and Predicting Processor Vulnerability to Soft Errors Using Statistical Techniques

    Get PDF
    The shrinking processor feature size, lower threshold voltage and increasing on-chip transistor density make current processors highly vulnerable to soft errors. Architectural Vulnerability Factor (AVF) reflects the probability that a raw soft error eventually causes a visible error in the program output, indicating the processorโ€™s susceptibility to soft errors at architectural level. The awareness of the AVF, both at the early design stage and during program runtime, is greatly useful for designing reliable processors. However, measuring the AVF is extremely costly, resulting in large overheads in hardware, computation, and power. The situation is further exacerbated in a multi-threaded processor environment where resource contention and data sharing exist among different threads. Consequently, predicting the AVF from other easily-measured metrics becomes extraordinarily attractive to computer designers. We propose a series of AVF modeling and prediction works via using advanced statistical techniques. First, we utilize the Boosted Regression Trees (BRT) scheme to dynamically predict the AVF during program execution from a variety of performance metrics. This correlation is generalized to be across different workloads, program phases, and processor configurations on a single-threaded superscalar processor. Second, the AVF prediction is extended to multi-threaded processors where the inter-thread resource contention shows significant and non-uniform impacts on different programs; we propose a two-level predictive mechanism using BRT as building blocks to characterize the contention behavior. Finally, we employ a rule search strategy named Patient Rule Induction Method (PRIM) to explore a large processor design space at the early design stage. We are capable of generating selective rules on important configuration parameters. These rules quantify the design space subregion yielding lowest values of the response, thereby providing useful guidelines for designing reliable processors while achieving high performance

    Porting the Ericsson Bluetooth Stack - A Real-Time Analysis

    Get PDF
    This master's thesis discusses the real-time issues of the operating system used by the Ericsson Bluetooth stack and the effects of replacing this operating system. The practical part consists of switching the operating system for the Ericsson Bluetooth stack and verifying if it still is operational and fulfils all timing requirement
    • โ€ฆ
    corecore