Search CORE

3 research outputs found

Performance optimizations for compiler-based error detection

Author: Mitropoulou Konstantina
Publication venue: The University of Edinburgh
Publication date: 29/06/2015
Field of study

The trend towards smaller transistor technologies and lower operating voltages stresses the hardware and makes transistors more susceptible to transient errors. In future systems, performance and power gains will come at the cost of unreliable areas on the chip. For this reason, there is an increased need for low-overhead highly-reliable error detection methodologies. In the last years, several techniques have been proposed. The majority of them are based on redundancy which can be implemented at several levels (e.g., hardware, instruction, thread, process, etc). In instruction-level error detection approaches, the compiler replicates the instructions of the program and inserts checks wherever they are needed. The checks evaluate code correctness and decide whether or not an error has occurred. This type of error detection is more flexible than the hardware alternatives. It allows the programmer to choose the protected area of the program and it can be applied without any hardware modifications. On the other hand, the replicated instructions and the checks cause a large slowdown making software techniques less appealing. In this thesis, we propose two techniques that aim at reducing the error detection overhead of compiler-based approaches and improving system’s performance without sacrificing the fault-coverage. The first technique, DRIFT, achieves this by decoupling the execution of the code (original and replicated) from the checks. The checks are compare and jump instructions. The latter ones tend to make the code sequential and prohibit the compiler from performing aggressive instruction scheduling optimizations. We call this phenomenon basic-block fragmentation. DRIFT reduces the impact of basic-block fragmentation by breaking the synchronized execute-check-confirm-execute cycle. In this way, DRIFT generates a scheduler-friendly code with more instruction-level parallelism (ILP). As a result, it reduces the performance overhead down to 1.29× (on average) and outperforms the state-of-the-art by up to 29.7% retaining the same fault-coverage. Next, CASTED focuses on reducing the impact of error detection overhead on single-chip scalable architectures that are composed of tightly-coupled cores. The proposed compiler methodology adaptively distributes the error detection overhead to the available resources across multiple cores, fully exploiting the abundant ILP of these architectures. CASTED adapts to a wide range of architecture configurations (issue-width, inter-core communication). The results show that CASTED matches the performance of, and often outperforms, sometimes by as mush as 21.2%, the best fixed state-of-the-art approach while maintaining the same fault coverage

CiteSeerX

내장형 프로세서에서의 코드 크기 최적화를 위한 아키텍처 설계 및 컴파일러 지원

Author: 이종원
Publication venue: 서울대학교 대학원
Publication date: 01/02/2014
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 백윤흥.Embedded processors usually need to satisfy very tight design constraints to achieve low power consumption, small chip area, and high performance. One of the obstacles to meeting these requirements is related to delivering instructions from instruction memory/caches. The size of instruction memory/cache considerably contributes total chip area. Further, frequent access to caches incurs high power/energy consumption and significantly hampers overall system performance due to cache misses. To reduce the negative effects of the instruction delivery, therefore, this study focuses on the sizing of instruction memory/cache through code size optimization. One observation for code size optimization is that very long instruction word (VLIW) architectures often consume more power and memory space than necessary due to long instruction bit-width. One way to lessen this problem is to adopt a reduced bit-width ISA (Instruction Set Architecture) that has a narrower instruction word length. In practice, however, it is impossible to convert a given ISA fully into an equivalent reduced bit-width one because the narrow instruction word, due to bitwidth restrictions, can encode only a small subset of normal instructions in the original ISA. To explore the possibility of complete conversion of an existing 32-bit ISA into a 16-bit one that supports effectively all 32-bit instructions, we propose the reduced bit-width (e.g. 16-bit × 4-way) VLIW architectures that equivalently behave as their original bit-width (e.g. 32-bit × 4-way) architectures with the help of dynamic implied addressing mode (DIAM). Second, we observe that code duplication techniques have been proposed to increase the reliability against soft errors in multi-issue embedded systems such as VLIW by exploiting empty slots for duplicated instructions. Unfortunately, all duplicated instructions cannot be allocated to empty slots, which enforces generating additional VLIW packets to include the duplicated instructions. The increase of code size due to the extra VLIW packets is necessarily accompanied with the enhanced reliability. In order to minimize code size, we propose a novel approach compiler-assisted dynamic code duplication scheme, which accepts an assembly code composed of only original instructions as input, and generates duplicated instructions at runtime with the help of encoded information attached to original instructions. Since the duplicates of original instructions are not explicitly present in the assembly code, the increase of code size due to the duplicated instructions can be avoided in the proposed scheme. Lastly, the third observation is that, to cope with soft errors similarly to the second observation, a recently proposed software-based technique with TMR (Triple Modular Redundancy) implemented on coarse-grained reconfigurable architectures (CGRA) incurs the increase of configuration size, which is corresponding to the code size of CGRA, and thus extreme overheads in terms of runtime and energy consumption mainly due to expensive voting mechanisms for the outputs from the triplication of every operation. To reduce the expensive performance overhead due to the large configuration from the validation mechanism, we propose selective validation mechanisms for efficient modular redundancy techniques in the datapath on CGRA. The proposed techniques selectively validate the results at synchronous operations rather than every operation.Abstract i Chapter 1 Introduction 1 1.1 Instruction Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The causes of code size increase . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Instruction Bit-width in VLIW Architectures . . . . . . . . . 2 1.2.2 Instruction Redundancy . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Reducing Instruction Bit-width with Dynamic Implied Addressing Mode (DIAM) 7 2.1 Conceptual View . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 ISA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Remote Operand Array Buffer . . . . . . . . . . . . . . . . . 15 2.2.3 Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Compiler Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 16-bit Instruction Generation . . . . . . . . . . . . . . . . . . 24 2.3.2 DDG Construction & Scheduling . . . . . . . . . . . . . . . 26 2.4 VLES(Variable Length Execution Set) Architecture with a Reduced Bit-width Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.1 Architecture Design . . . . . . . . . . . . . . . . . . . . . . 30 2.4.2 Compiler Support . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . 48 2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 3 Compiler-assisted Dynamic Code Duplication Scheme for Soft Error Resilient VLIW Architectures 53 3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 Compiler-assisted Dynamic Code Duplication . . . . . . . . . . . . . 58 3.2.1 ISA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.2 Modified Fetch Stage . . . . . . . . . . . . . . . . . . . . . . 62 3.3 Compilation Techniques . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.1 Static Code Duplication Algorithm . . . . . . . . . . . . . . 67 3.3.2 Vulnerability-aware Duplication Algorithm . . . . . . . . . . 68 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 71 3.4.2 Effectiveness of Compiler-assisted Dynamic Code Duplication 73 3.4.3 Effectiveness of Vulnerability-aware Duplication Algorithm . 77 Chapter 4 Selective Validation Techniques for Robust CGRAs against Soft Errors 85 4.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.1 Selective Validation Mechanism . . . . . . . . . . . . . . . . 91 4.3.2 Compilation Flow and Performance Analysis . . . . . . . . . 92 4.3.3 Fault Coverage Analysis . . . . . . . . . . . . . . . . . . . . 96 4.3.4 Our Optimization - Minimizing Store Operation . . . . . . . . 97 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 100 Chapter 5 Conculsion 110 초록 122Docto

Technology 2001: The Second National Technology Transfer Conference and Exposition, volume 2

Author
Publication venue
Publication date
Field of study

Proceedings of the workshop are presented. The mission of the conference was to transfer advanced technologies developed by the Federal government, its contractors, and other high-tech organizations to U.S. industries for their use in developing new or improved products and processes. Volume two presents papers on the following topics: materials science, robotics, test and measurement, advanced manufacturing, artificial intelligence, biotechnology, electronics, and software engineering