714 research outputs found

    Semiannual report, 1 October 1990 - 31 March 1991

    Get PDF
    Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized

    Efficient Placement and Migration Policies for an STT-RAM based Hybrid L1 Cache for Intermittently Powered Systems

    Full text link
    The number of battery-powered devices is rapidly increasing due to the widespread use of IoT-enabled nodes in various fields. Energy harvesters, which help to power embedded devices, are a feasible alternative to replacing battery-powered devices. In a capacitor, the energy harvester stores enough energy to power up the embedded device and compute the task. This type of computation is referred to as intermittent computing. Energy harvesters are unable to supply continuous power to embedded devices. All registers and cache in conventional processors are volatile. We require a Non-Volatile Memory (NVM)-based Non-Volatile Processor (NVP) that can store registers and cache contents during a power failure. NVM-based caches reduce system performance and consume more energy than SRAM-based caches. This paper proposes Efficient Placement and Migration policies for hybrid cache architecture that uses SRAM and STT-RAM at the first level cache. The proposed architecture includes cache block placement and migration policies to reduce the number of writes to STT-RAM. During a power failure, the backup strategy identifies and migrates the critical blocks from SRAM to STT-RAM. When compared to the baseline architecture, the proposed architecture reduces STT-RAM writes from 63.35% to 35.93%, resulting in a 32.85% performance gain and a 23.42% reduction in energy consumption. Our backup strategy reduces backup time by 34.46% when compared to the baseline

    GPU 에러 안정성 보장을 위한 컴파일러 기법

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 이재진.Due to semiconductor technology scaling and near-threshold voltage computing, soft error resilience has become more important. Nowadays, GPUs are widely used in high performance computing (HPC) because of its efficient parallel processing and modern GPUs designed for HPC use error correction code (ECC) to protect their storage including register files. However, adopting ECC in the register file imposes high area and energy overhead. To replace the expensive hardware cost of ECC, we propose Penny, a lightweight compiler-directed resilience scheme for GPU register file protection. We combine recent advances in idempotent recovery with low-cost error detection code. Our approach focuses on solving two important problems: 1. Can we guarantee correct error recovery using idempotent execution with error detection code? We show that when an error detection code is used with idempotence recovery, certain restrictions required by previous idempotent recovery schemes are no longer needed. We also propose a software-based scheme to prevent the checkpoint value from being overwritten before the end of the region where the value is required for correct recovery. 2. How do we reduce the execution overhead caused by checkpointing? In GPUs additional checkpointing store instructions inflicts considerably higher overhead compared to CPUs, due to its architectural characteristics, such as lack of store buffers. We propose a number of compiler optimizations techniques that significantly reduce the overhead.반도체 미세공정 기술이 발전하고 문턱전압 근처 컴퓨팅(near-threashold voltage computing)이 도입됨에 따라서 소프트 에러로부터의 복원이 중요한 과제가 되었다. 강력한 병렬 계산 성능을 지닌 GPU는 고성능 컴퓨팅에서 중요한 위치를 차지하게 되었고, 슈퍼 컴퓨터에서 쓰이는 GPU들은 에러 복원 코드인 ECC를 사용하여 레지스터 파일 및 메모리 등에 저장된 데이터를 보호하게 되었다. 하지만 레지스터 파일에 ECC를 사용하는 것은 큰 하드웨어나 에너지 비용을 필요로 한다. 이런 값비싼 ECC의 하드웨어 비용을 줄이기 위해 본 논문에서는 컴파일러 기반의 저비용 GPU 레지스터 파일 복원 기법인 Penny를 제안한다. 이는 최신의 멱등성(idempotency) 기반 에러 복원 기법을 저비용의 에러 검출 코드(EDC)와 결합한 것이다. 본 논문은 다음 두가지 문제를 해결하는 데에 집중한다. 1. 에러 검출 코드 기반으로 멱등성 기반 에러 복원을 사용시 소프트 에러로부터의 안전한 복원을 보장할 수 있는가?} 본 논문에서는 에러 검출 코드가 멱등성 기반 복원 기술과 같이 사용되었을 경우 기존의 복원 기법에서 필요로 했던 조건들 없이도 안전하게 에러로부터 복원할 수 있음을 보인다. 2. 체크포인팅에드는 비용을 어떻게 절감할 수 있는가?} GPU는 스토어 버퍼가 없는 등 아키텍쳐적인 특성으로 인해서 CPU와 비교하여 체크포인트 값을 저장하는 데에 큰 오버헤드가 든다. 이 문제를 해결하기 위해 본 논문에서는 다양한 컴파일러 최적화 기법을 통하여 오버헤드를 줄인다.1 Introduction 1 1.1 Why is Soft Error Resilience Important in GPUs 1 1.2 How can the ECC Overhead be Reduced 3 1.3 What are the Challenges 4 1.4 How do We Solve the Challenges 5 2 Comparison of Error Detection and Correction Coding Schemes for Register File Protection 7 2.1 Error Correction Codes and Error Detection Codes 8 2.2 Cost of Coding Schemes 9 2.3 Soft Error Frequency of GPUs 11 3 Idempotent Recovery and Challenges 13 3.1 Idempotent Execution 13 3.2 Previous Idempotent Schemes 13 3.2.1 De Kruijf's Idempotent Translation 14 3.2.2 Bolts's Idempotent Recovery 15 3.2.3 Comparison between Idempotent Schemes 15 3.3 Idempotent Recovery Process 17 3.4 Idempotent Recovery Challenges for GPUs 18 3.4.1 Checkpoint Overwriting 20 3.4.2 Performance Overhead 20 4 Correctness of Recovery 22 4.1 Proof of Safe Recovery 23 4.1.1 Prevention of Error Propagation 23 4.1.2 Proof of Correct State Recovery 24 4.1.3 Correctness in Multi-Threaded Execution 28 4.2 Preventing Checkpoint Overwriting 30 4.2.1 Register renaming 31 4.2.2 Storage Alternation by Checkpoint Coloring 33 4.2.3 Automatic Algorithm Selection 38 4.2.4 Future Works 38 5 Performance Optimizations 40 5.1 Compilation Phases of Penny 40 5.1.1 Region Formation 41 5.1.2 Bimodal Checkpoint Placement 41 5.1.3 Storage Alternation 42 5.1.4 Checkpoint Pruning 43 5.1.5 Storage Assignment 44 5.1.6 Code Generation and Low-level Optimizations 45 5.2 Cost Estimation Model 45 5.3 Region Formation 46 5.3.1 De Kruijf's Heuristic Region Formation 46 5.3.2 Region splitting and Region Stitching 47 5.3.3 Checkpoint-Cost Aware Optimal Region Formation 48 5.4 Bimodal Checkpoint Placement 52 5.5 Optimal Checkpoint Pruning 55 5.5.1 Bolt's Naive Pruning Algorithm and Overview of Penny's Optimal Pruning Algorithm 55 5.5.2 Phase 1: Collecting Global-Decision Independent Status 56 5.5.3 Phase2: Ordering and Finalizing Renaming Decisions 60 5.5.4 Effectiveness of Eliminating the Checkpoints 63 5.6 Automatic Checkpoint Storage Assignment 69 5.7 Low-Level Optimizations and Code Generation 70 6 Evaluation 74 6.1 Test Environment 74 6.1.1 GPU Architecture and Simulation Setup 74 6.1.2 Tested Applications 75 6.1.3 Register Assignment 76 6.2 Performance Evaluation 77 6.2.1 Overall Performance Overheads 77 6.2.2 Impact of Penny's Optimizations 78 6.2.3 Assigning Checkpoint Storage and Its Integrity 79 6.2.4 Impact of Optimal Checkpoint Pruning 80 6.2.5 Impact of Alias Analysis 81 6.3 Repurposing the Saved ECC Area 82 6.4 Energy Impact on Execution 83 6.5 Performance Overhead on Volta Architecture 85 6.6 Compilation Time 85 7 RelatedWorks 87 8 Conclusion and Future Works 89 8.1 Limitation and Future Work 90Docto

    Shear-promoted drug encapsulation into red blood cells: a CFD model and μ-PIV analysis

    Get PDF
    The present work focuses on the main parameters that influence shear-promoted encapsulation of drugs into erythrocytes. A CFD model was built to investigate the fluid dynamics of a suspension of particles flowing in a commercial micro channel. Micro Particle Image Velocimetry (μ-PIV) allowed to take into account for the real properties of the red blood cell (RBC), thus having a deeper understanding of the process. Coupling these results with an analytical diffusion model, suitable working conditions were defined for different values of haematocrit

    Aeronautical engineering: A continuing bibliography with indexes (supplement 292)

    Get PDF
    This bibliography lists 675 reports, articles, and other documents recently introduced into the NASA scientific and technical information system database. Subject coverage includes the following: design, construction and testing of aircraft and aircraft engines; aircraft components, equipment, and systems; ground support systems; and theoretical and applied aspects of aerodynamics and general fluid dynamics

    Aeronautical engineering: A continuing bibliography with indexes (supplement 296)

    Get PDF
    This bibliography lists 592 reports, articles, and other documents introduced into the NASA scientific and technical information system in Oct. 1993. Subject coverage includes: design, construction and testing of aircraft and aircraft engines; aircraft components, equipment, and systems; ground support systems; and theoretical and applied aspects of aerodynamics and general fluid dynamics

    Support des compilateurs statiques et dynamiques pour les systèmes informatiques alimentés par intermittence

    Get PDF
    With the advent of Internet of things (IoT), there is a need to provide energy for a massive number of smart tiny devices without using large, heavy, and high maintenance batteries. One promising way is to harvest energy from the environment and store it into an energy buffer such as a capacitor. In this way, programs are being executed as long as there is available energy in the capacitor, and crash when it exhausts. Recently, different software and hardware based checkpointing strategies have been proposed to make forward progress toward execution for energy harvesting IoT devices. This thesis introduces two different software solutions based on static and dynamic compilation. The proposed static compiler inserts checkpoints based on statically-computed worst-case energy consumption of code sections. Moreover, it applies classical compiler optimizations in order to decrease the required number of checkpoints at runtime. The proposed dynamic compilation technique delays checkpoint placement and specialization to the runtime and takes decisions based on the past power failures and execution paths taken before each power failure. Both proposed solutions guarantee making forward progress as well as keeping the memory consistent. Furthermore, they aim to increase portability by not using any hardware feature of the IoT device. In addition, they are transparent to the programmer.Récemment, différentes stratégies de checkpointing basées sur le logiciel et le matériel ont été proposées pour avancer vers l'exécution pour les dispositifs IoT de récolte d'énergie. Cette thèse présente deux solutions logicielles différentes basées sur la compilation statique et dynamique. Le compilateur statique proposé insère des points de contrôle basés sur la pire consommation d'énergie des sections de code calculée de manière statique. En outre, il applique les optimisations classiques du compilateur afin de réduire le nombre de points de contrôle requis à l'exécution. La technique de compilation dynamique proposée reporte le placement et la spécialisation des points de contrôle au moment de l'exécution et prend des décisions en fonction des pannes de courant passées et des chemins d'exécution empruntés avant chaque panne de courant. Les deux solutions proposées garantissent une progression vers l'avant ainsi que le maintien de la cohérence de la mémoire. En outre, elles visent à accroître la portabilité en n'utilisant aucune caractéristique matérielle des systèmes IoT. En outre, elles sont transparentes pour le programmeur

    Joint University Program for Air Transportation Research, 1988-1989

    Get PDF
    The research conducted during 1988 to 1989 under the NASA/FAA-sponsored Joint University Program for Air Transportation Research is summarized. The Joint University Program is a coordinated set of three grants sponsored by NASA Langley Research Center and the Federal Aviation Administration, one each with the Massachusetts Institute of Technology, Ohio University, and Princeton University. Completed works, status reports, and annotated bibliographies are presented for research topics, which include computer science, guidance and control theory and practice, aircraft performance, flight dynamics, and applied experimental psychology. An overview of the year's activities for each university is also presented
    corecore