Search CORE

977 research outputs found

FPGA acceleration of a quantized neural network for remote-sensed cloud detection

Author: Crockett Louise
Greenland Steve
Ireland Murray
Karagiannakis Philipp
Reiter Philippe
Publication venue
Publication date: 23/09/2020
Field of study

The capture and transmission of remote-sensed imagery for Earth observation is both computationally and bandwidth expensive. In the analyses of remote-sensed imagery in the visual band, atmospheric cloud cover can obstruct up to two-thirds of observations, resulting in costly imagery being discarded. Mission objectives and satellite operational details vary; however, assuming a cloud-free observation requirement, a doubling of useful data downlinked with an associated halving of delivery cost is possible through effective cloud detection. A minimal-resource, real-time inference neural network is ideally suited to perform automatic cloud detection, both for pre-processing captured images prior to transmission and preventing unnecessary images being taken by larger payload sensors. Much of the hardware complexity of modern neural network implementations resides in high-precision floating-point calculation pipelines. In recent years, research has been conducted in identifying quantized, or low-integer precision equivalents to known deep learning models, which do not require the extensive resources of their floating-point, full-precision counterparts. Our work leverages existing research on binary and quantized neural networks to develop a real-time, remote-sensed cloud detection solution using a commodity field-programmable gate array. This follows on developments of the Forwards Looking Imager for predictive cloud detection developed by Craft Prospect, a space engineering practice based in Glasgow, UK. The synthesized cloud detection accelerator achieved an inference throughput of 358.1 images per second with a maximum power consumption of 2.4 W. This throughput is an order of magnitude faster than alternate algorithmic options for the Forwards Looking Imager at around one third reduction in classification accuracy, and approximately two orders of magnitude faster than the CloudScout deep neural network, deployed with HyperScout 2 on the European Space Agency PhiSat-1 mission. Strategies for incorporating fault tolerance mechanisms are expounded

University of Strathclyde Institutional Repository

Hierarchically Clustered Adaptive Quantization CMAC and Its Learning Convergence

Author: Lai Edmund M-K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2007
Field of study

No abstract availabl

Massey Research Online

Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning

Author: Schmidt Bernhard
Teich Jürgen
Ziener Daniel
Zöllner Christian
Publication venue: 'Elsevier BV'
Publication date: 25/07/2017
Field of study

Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an error-free operation after SEU recovering if the affected configuration bits do belong to feedback loops of the implemented circuits. In this paper, we a) provide a netlist-based circuit analysis technique to distinguish so-called critical configuration bits from essential bits in order to identify configuration bits which will need also state-restoring actions after a recovered SEU and which not. Furthermore, b) an alternative classification approach using fault injection is developed in order to compare both classification techniques. Moreover, c) we will propose a floorplanning approach for reducing the effective number of scrubbed frames and d), experimental results will give evidence that our optimization methodology not only allows to detect errors earlier but also to minimize the Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show that by using our approach, the MTTR for datapath-intensive circuits can be reduced by up to 48.5% in comparison to standard approaches

arXiv.org e-Print Archive

Dependable Embedded Systems

Author: Dutt Nikil
Henkel Jörg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

OAPEN Library

내장형 프로세서에서의 코드 크기 최적화를 위한 아키텍처 설계 및 컴파일러 지원

Author: 이종원
Publication venue: 서울대학교 대학원
Publication date: 01/02/2014
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 백윤흥.Embedded processors usually need to satisfy very tight design constraints to achieve low power consumption, small chip area, and high performance. One of the obstacles to meeting these requirements is related to delivering instructions from instruction memory/caches. The size of instruction memory/cache considerably contributes total chip area. Further, frequent access to caches incurs high power/energy consumption and significantly hampers overall system performance due to cache misses. To reduce the negative effects of the instruction delivery, therefore, this study focuses on the sizing of instruction memory/cache through code size optimization. One observation for code size optimization is that very long instruction word (VLIW) architectures often consume more power and memory space than necessary due to long instruction bit-width. One way to lessen this problem is to adopt a reduced bit-width ISA (Instruction Set Architecture) that has a narrower instruction word length. In practice, however, it is impossible to convert a given ISA fully into an equivalent reduced bit-width one because the narrow instruction word, due to bitwidth restrictions, can encode only a small subset of normal instructions in the original ISA. To explore the possibility of complete conversion of an existing 32-bit ISA into a 16-bit one that supports effectively all 32-bit instructions, we propose the reduced bit-width (e.g. 16-bit × 4-way) VLIW architectures that equivalently behave as their original bit-width (e.g. 32-bit × 4-way) architectures with the help of dynamic implied addressing mode (DIAM). Second, we observe that code duplication techniques have been proposed to increase the reliability against soft errors in multi-issue embedded systems such as VLIW by exploiting empty slots for duplicated instructions. Unfortunately, all duplicated instructions cannot be allocated to empty slots, which enforces generating additional VLIW packets to include the duplicated instructions. The increase of code size due to the extra VLIW packets is necessarily accompanied with the enhanced reliability. In order to minimize code size, we propose a novel approach compiler-assisted dynamic code duplication scheme, which accepts an assembly code composed of only original instructions as input, and generates duplicated instructions at runtime with the help of encoded information attached to original instructions. Since the duplicates of original instructions are not explicitly present in the assembly code, the increase of code size due to the duplicated instructions can be avoided in the proposed scheme. Lastly, the third observation is that, to cope with soft errors similarly to the second observation, a recently proposed software-based technique with TMR (Triple Modular Redundancy) implemented on coarse-grained reconfigurable architectures (CGRA) incurs the increase of configuration size, which is corresponding to the code size of CGRA, and thus extreme overheads in terms of runtime and energy consumption mainly due to expensive voting mechanisms for the outputs from the triplication of every operation. To reduce the expensive performance overhead due to the large configuration from the validation mechanism, we propose selective validation mechanisms for efficient modular redundancy techniques in the datapath on CGRA. The proposed techniques selectively validate the results at synchronous operations rather than every operation.Abstract i Chapter 1 Introduction 1 1.1 Instruction Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The causes of code size increase . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Instruction Bit-width in VLIW Architectures . . . . . . . . . 2 1.2.2 Instruction Redundancy . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Reducing Instruction Bit-width with Dynamic Implied Addressing Mode (DIAM) 7 2.1 Conceptual View . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 ISA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Remote Operand Array Buffer . . . . . . . . . . . . . . . . . 15 2.2.3 Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Compiler Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 16-bit Instruction Generation . . . . . . . . . . . . . . . . . . 24 2.3.2 DDG Construction & Scheduling . . . . . . . . . . . . . . . 26 2.4 VLES(Variable Length Execution Set) Architecture with a Reduced Bit-width Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.1 Architecture Design . . . . . . . . . . . . . . . . . . . . . . 30 2.4.2 Compiler Support . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . 48 2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 3 Compiler-assisted Dynamic Code Duplication Scheme for Soft Error Resilient VLIW Architectures 53 3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 Compiler-assisted Dynamic Code Duplication . . . . . . . . . . . . . 58 3.2.1 ISA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.2 Modified Fetch Stage . . . . . . . . . . . . . . . . . . . . . . 62 3.3 Compilation Techniques . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.1 Static Code Duplication Algorithm . . . . . . . . . . . . . . 67 3.3.2 Vulnerability-aware Duplication Algorithm . . . . . . . . . . 68 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 71 3.4.2 Effectiveness of Compiler-assisted Dynamic Code Duplication 73 3.4.3 Effectiveness of Vulnerability-aware Duplication Algorithm . 77 Chapter 4 Selective Validation Techniques for Robust CGRAs against Soft Errors 85 4.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.1 Selective Validation Mechanism . . . . . . . . . . . . . . . . 91 4.3.2 Compilation Flow and Performance Analysis . . . . . . . . . 92 4.3.3 Fault Coverage Analysis . . . . . . . . . . . . . . . . . . . . 96 4.3.4 Our Optimization - Minimizing Store Operation . . . . . . . . 97 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 100 Chapter 5 Conculsion 110 초록 122Docto

SNU Open Repository and Archive

Radiation Hardened by Design Methodologies for Soft-Error Mitigated Digital Architectures

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Digital architectures for data encryption, processing, clock synthesis, data transfer, etc. are susceptible to radiation induced soft errors due to charge collection in complementary metal oxide semiconductor (CMOS) integrated circuits (ICs). Radiation hardening by design (RHBD) techniques such as double modular redundancy (DMR) and triple modular redundancy (TMR) are used for error detection and correction respectively in such architectures. Multiple node charge collection (MNCC) causes domain crossing errors (DCE) which can render the redundancy ineffectual. This dissertation describes techniques to ensure DCE mitigation with statistical confidence for various designs. Both sequential and combinatorial logic are separated using these custom and computer aided design (CAD) methodologies. Radiation vulnerability and design overhead are studied on VLSI sub-systems including an advanced encryption standard (AES) which is DCE mitigated using module level coarse separation on a 90-nm process with 99.999% DCE mitigation. A radiation hardened microprocessor (HERMES2) is implemented in both 90-nm and 55-nm technologies with an interleaved separation methodology with 99.99% DCE mitigation while achieving 4.9% increased cell density, 28.5 % reduced routing and 5.6% reduced power dissipation over the module fences implementation. A DMR register-file (RF) is implemented in 55 nm process and used in the HERMES2 microprocessor. The RF array custom design and the decoders APR designed are explored with a focus on design cycle time. Quality of results (QOR) is studied from power, performance, area and reliability (PPAR) perspective to ascertain the improvement over other design techniques. A radiation hardened all-digital multiplying pulsed digital delay line (DDL) is designed for double data rate (DDR2/3) applications for data eye centering during high speed off-chip data transfer. The effect of noise, radiation particle strikes and statistical variation on the designed DDL are studied in detail. The design achieves the best in class 22.4 ps peak-to-peak jitter, 100-850 MHz range at 14 pJ/cycle energy consumption. Vulnerability of the non-hardened design is characterized and portions of the redundant DDL are separated in custom and auto-place and route (APR). Thus, a range of designs for mission critical applications are implemented using methodologies proposed in this work and their potential PPAR benefits explored in detail.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Error Mitigation Using Approximate Logic Circuits: A Comparison of Probabilistic and Evolutionary Approaches

Author: Entrena Arrontes Luis Alfonso
Hrbacek Radek
Sekanina Lukas
Sánchez Clemente Antonio José
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Technology scaling poses an increasing challenge to the reliability of digital circuits. Hardware redundancy solutions, such as triple modular redundancy (TMR), produce very high area overhead, so partial redundancy is often used to reduce the overheads. Approximate logic circuits provide a general framework for optimized mitigation of errors arising from a broad class of failure mechanisms, including transient, intermittent, and permanent failures. However, generating an optimal redundant logic circuit that is able to mask the faults with the highest probability while minimizing the area overheads is a challenging problem. In this study, we propose and compare two new approaches to generate approximate logic circuits to be used in a TMR schema. The probabilistic approach approximates a circuit in a greedy manner based on a probabilistic estimation of the error. The evolutionary approach can provide radically different solutions that are hard to reach by other methods. By combining these two approaches, the solution space can be explored in depth. Experimental results demonstrate that the evolutionary approach can produce better solutions, but the probabilistic approach is close. On the other hand, these approaches provide much better scalability than other existing partial redundancy techniques.This work was supported by the Ministry of Economy and Competitiveness of Spain under project ESP2015-68245-C4-1-P, and by the Czech science foundation project GA16-17538S and the Ministry of Education, Youth and Sports of the Czech Republic from the National Programme of Sustainability (NPU II); project IT4Innovations excellence in science - LQ1602

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Review of Fault Mitigation Approaches for Deep Neural Networks for Computer Vision in Autonomous Driving

Author: Cerino Mattia
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 21/07/2020
Field of study

The aim of this work is to identify and present challenges and risks related to the employment of DNNs in Computer Vision for Autonomous Driving. Nowadays one of the major technological challenges is to choose the right technology among the abundance that is available on the market. Specifically, in this thesis it is collected a synopsis of the state-of-the-art architectures, techniques and methodologies adopted for building fault-tolerant hardware and ensuring robustness in DNNs-based Computer Vision applications for Autonomous Driving

AMS Tesi di Laurea