1,473 research outputs found

    Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study

    Get PDF
    This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications

    Evolvable hardware platform for fault-tolerant reconfigurable sensor electronics

    Get PDF

    VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

    Full text link
    The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure

    Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture

    Get PDF
    We show that the latest version of massively parallel processing associative string processing architecture (System-V) is applicable for fast Monte Carlo simulation if an effective on-processor random number generator is implemented. Our lagged Fibonacci generator can produce 10810^8 random numbers on a processor string of 12K PE-s. The time dependent Monte Carlo algorithm of the one-dimensional non-equilibrium kinetic Ising model performs 80 faster than the corresponding serial algorithm on a 300 MHz UltraSparc.Comment: 8 pages, 9 color ps figures embedde

    Stochastic Theater: Stochastic Datapath Generation Framework for Fault-Tolerant IoT Sensors

    Get PDF
    Stochastic Computing has emerged as a competitive computing paradigm that produces fast and simple implementations of arithmetic operations, while offering high levels of parallelism, and graceful degradation of the results when in the presence of errors. IoT devices are often operate under limited power and area constraints and subjected to harsh environments, for which, traditional computing paradigms struggle to provide high availability and fault-tolerance. Stochastic Computing is based on the computation of pseudo-random sequences of bits, hence requiring only a single bit per signal, rather than a data-bus. Notwithstanding, we haven’t witnessed its inclusion in custom computing systems. In this direction, this work presents Stochastic Theater, a framework to specify, simulate, and test Stochastic Datapaths to perform computations using stochastic bitstreams targeting IoT systems. In virtue of the granularity of the bitstreams, the bit-level specification of circuits, high-performance characteristics and reconfigurable capabilities, FPGAs were adopted to implement and test such systems. The proposed framework creates Stochastic Machines from a set of user defined arithmetic expressions, and then tests them with the corresponding input values and specific fault injection patterns. Besides the support to create autonomous Stochastic Computing systems, the presented framework also provides generation of stochastic units, being able to produce estimates on performance, resources and power. A demonstration is presented targeting KLT, typical method for data compression in IoT applications

    Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM

    Full text link
    [EN] Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can detect intermittent faults and swap from an initial generic ECC to a specific ECC capable of tolerating one intermittent fault. We have inserted the mechanism in the memory system of a 32-bit RISC processor and validated it by using VHDL simulation-based fault injection. We have used two (39, 32) codes: a single error correction-double error detection (SEC-DED) and a code developed by our research group, called EPB3932, capable of correcting single errors and double and triple adjacent errors that include a bit previously tagged as error-prone. The results of injecting transient, intermittent, and combinations of intermittent and transient faults show that the proposed mechanism works properly. As an example, the percentage of failures and latent errors is 0% when injecting a triple adjacent fault after an intermittent stuck-at fault. We have synthesized the adaptive fault tolerance mechanism proposed in two types of FPGAs: non-reconfigurable and partially reconfigurable. In both cases, the overhead introduced is affordable in terms of hardware, time and power consumption.This research was supported in part by the Spanish Government, project TIN2016-81,075-R, and by Primeros Proyectos de Investigacion (PAID-06-18), Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), project 20190032.Baraza Calvo, JC.; Gracia-Morán, J.; Saiz-Adalid, L.; Gil Tomás, DA.; Gil, P. (2020). Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM. Electronics. 9(12):1-30. https://doi.org/10.3390/electronics9122074S130912International Technology Roadmap for Semiconductors (ITRS)http://www.itrs2.net/2013-itrs.htmlJeng, S.-L., Lu, J.-C., & Wang, K. (2007). A Review of Reliability Research on Nanotechnology. IEEE Transactions on Reliability, 56(3), 401-410. doi:10.1109/tr.2007.903188Ibe, E., Taniguchi, H., Yahagi, Y., Shimbo, K., & Toba, T. (2010). Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule. IEEE Transactions on Electron Devices, 57(7), 1527-1538. doi:10.1109/ted.2010.2047907Boussif, A., Ghazel, M., & Basilio, J. C. (2020). Intermittent fault diagnosability of discrete event systems: an overview of automaton-based approaches. Discrete Event Dynamic Systems, 31(1), 59-102. doi:10.1007/s10626-020-00324-yConstantinescu, C. (2003). Trends and challenges in VLSI circuit reliability. IEEE Micro, 23(4), 14-19. doi:10.1109/mm.2003.1225959Bondavalli, A., Chiaradonna, S., Di Giandomenico, F., & Grandoni, F. (2000). Threshold-based mechanisms to discriminate transient from intermittent faults. IEEE Transactions on Computers, 49(3), 230-245. doi:10.1109/12.841127Contant, O., Lafortune, S., & Teneketzis, D. (2004). Diagnosis of Intermittent Faults. Discrete Event Dynamic Systems, 14(2), 171-202. doi:10.1023/b:disc.0000018570.20941.d2Sorensen, B. A., Kelly, G., Sajecki, A., & Sorensen, P. W. (s. f.). An analyzer for detecting intermittent faults in electronic devices. Proceedings of AUTOTESTCON ’94. doi:10.1109/autest.1994.381590Gracia-Moran, J., Gil-Tomas, D., Saiz-Adalid, L. J., Baraza, J. C., & Gil-Vicente, P. J. (2010). Experimental validation of a fault tolerant microcomputer system against intermittent faults. 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). doi:10.1109/dsn.2010.5544288Fujiwara, E. (2005). Code Design for Dependable Systems. doi:10.1002/0471792748Hamming, R. W. (1950). Error Detecting and Error Correcting Codes. Bell System Technical Journal, 29(2), 147-160. doi:10.1002/j.1538-7305.1950.tb00463.xSaiz-Adalid, L.-J., Gil-Vicente, P.-J., Ruiz-García, J.-C., Gil-Tomás, D., Baraza, J.-C., & Gracia-Morán, J. (2013). Flexible Unequal Error Control Codes with Selectable Error Detection and Correction Levels. Computer Safety, Reliability, and Security, 178-189. doi:10.1007/978-3-642-40793-2_17Frei, R., McWilliam, R., Derrick, B., Purvis, A., Tiwari, A., & Di Marzo Serugendo, G. (2013). Self-healing and self-repairing technologies. The International Journal of Advanced Manufacturing Technology, 69(5-8), 1033-1061. doi:10.1007/s00170-013-5070-2Maiz, J., Hareland, S., Zhang, K., & Armstrong, P. (s. f.). Characterization of multi-bit soft error events in advanced SRAMs. IEEE International Electron Devices Meeting 2003. doi:10.1109/iedm.2003.1269335Schroeder, B., Pinheiro, E., & Weber, W.-D. (2011). DRAM errors in the wild. Communications of the ACM, 54(2), 100-107. doi:10.1145/1897816.1897844BanaiyanMofrad, A., Ebrahimi, M., Oboril, F., Tahoori, M. B., & Dutt, N. (2015). Protecting caches against multi-bit errors using embedded erasure coding. 2015 20th IEEE European Test Symposium (ETS). doi:10.1109/ets.2015.7138735Kim, J., Sullivan, M., Lym, S., & Erez, M. (2016). All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). doi:10.1109/isca.2016.60Hwang, A. A., Stefanovici, I. A., & Schroeder, B. (2012). Cosmic rays don’t strike twice. ACM SIGPLAN Notices, 47(4), 111-122. doi:10.1145/2248487.2150989Gil-Tomás, D., Gracia-Morán, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Studying the effects of intermittent faults on a microcontroller. Microelectronics Reliability, 52(11), 2837-2846. doi:10.1016/j.microrel.2012.06.004Plasma CPU Modelhttps://opencores.org/projects/plasmaArlat, J., Aguera, M., Amat, L., Crouzet, Y., Fabre, J.-C., Laprie, J.-C., 
 Powell, D. (1990). Fault injection for dependability validation: a methodology and some applications. IEEE Transactions on Software Engineering, 16(2), 166-182. doi:10.1109/32.44380Gil-Tomas, D., Gracia-Moran, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Analyzing the Impact of Intermittent Faults on Microprocessors Applying Fault Injection. IEEE Design & Test of Computers, 29(6), 66-73. doi:10.1109/mdt.2011.2179514Rashid, L., Pattabiraman, K., & Gopalakrishnan, S. (2010). Modeling the Propagation of Intermittent Hardware Faults in Programs. 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing. doi:10.1109/prdc.2010.52Amiri, M., Siddiqui, F. M., Kelly, C., Woods, R., Rafferty, K., & Bardak, B. (2016). FPGA-Based Soft-Core Processors for Image Processing Applications. Journal of Signal Processing Systems, 87(1), 139-156. doi:10.1007/s11265-016-1185-7Hailesellasie, M., Hasan, S. R., & Mohamed, O. A. (2019). MulMapper: Towards an Automated FPGA-Based CNN Processor Generator Based on a Dynamic Design Space Exploration. 2019 IEEE International Symposium on Circuits and Systems (ISCAS). doi:10.1109/iscas.2019.8702589Mittal, S. (2018). A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications, 32(4), 1109-1139. doi:10.1007/s00521-018-3761-1Intel Completes Acquisition of Alterahttps://newsroom.intel.com/news-releases/intel-completes-acquisition-of-altera/#gs.mi6ujuAMD to Acquire Xilinx, Creating the Industry’s High Performance Computing Leaderhttps://www.amd.com/en/press-releases/2020-10-27-amd-to-acquire-xilinx-creating-the-industry-s-high-performance-computingKim, K. H., & Lawrence, T. F. (s. f.). Adaptive fault tolerance: issues and approaches. [1990] Proceedings. Second IEEE Workshop on Future Trends of Distributed Computing Systems. doi:10.1109/ftdcs.1990.138292Gonzalez, O., Shrikumar, H., Stankovic, J. A., & Ramamritham, K. (s. f.). Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling. Proceedings Real-Time Systems Symposium. doi:10.1109/real.1997.641271Jacobs, A., George, A. D., & Cieslewski, G. (2009). Reconfigurable fault tolerance: A framework for environmentally adaptive fault mitigation in space. 2009 International Conference on Field Programmable Logic and Applications. doi:10.1109/fpl.2009.5272313Shin, D., Park, J., Park, J., Paul, S., & Bhunia, S. (2017). Adaptive ECC for Tailored Protection of Nanoscale Memory. IEEE Design & Test, 34(6), 84-93. doi:10.1109/mdat.2016.2615844Silva, F., Muniz, A., Silveira, J., & Marcon, C. (2020). CLC-A: An Adaptive Implementation of the Column Line Code (CLC) ECC. 2020 33rd Symposium on Integrated Circuits and Systems Design (SBCCI). doi:10.1109/sbcci50935.2020.9189901Mukherjee, S. S., Emer, J., Fossum, T., & Reinhardt, S. K. (s. f.). Cache scrubbing in microprocessors: myth or necessity? 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings. doi:10.1109/prdc.2004.1276550Saleh, A. M., Serrano, J. J., & Patel, J. H. (1990). Reliability of scrubbing recovery-techniques for memory systems. IEEE Transactions on Reliability, 39(1), 114-122. doi:10.1109/24.52622X9SRA User’s Manual (Rev. 1.1)https://www.manualshelf.com/manual/supermicro/x9sra/user-s-manual-1-1.htmlChishti, Z., Alameldeen, A. R., Wilkerson, C., Wu, W., & Lu, S.-L. (2009). Improving cache lifetime reliability at ultra-low voltages. Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42. doi:10.1145/1669112.1669126Datta, R., & Touba, N. A. (2011). Designing a fast and adaptive error correction scheme for increasing the lifetime of phase change memories. 29th VLSI Test Symposium. doi:10.1109/vts.2011.5783773Kim, J., Lim, J., Cho, W., Shin, K.-S., Kim, H., & Lee, H.-J. (2016). Adaptive Memory Controller for High-performance Multi-channel Memory. JSTS:Journal of Semiconductor Technology and Science, 16(6), 808-816. doi:10.5573/jsts.2016.16.6.808Yuan, L., Liu, H., Jia, P., & Yang, Y. (2015). Reliability-Based ECC System for Adaptive Protection of NAND Flash Memories. 2015 Fifth International Conference on Communication Systems and Network Technologies. doi:10.1109/csnt.2015.23Zhou, Y., Wu, F., Lu, Z., He, X., Huang, P., & Xie, C. (2019). SCORE. ACM Transactions on Architecture and Code Optimization, 15(4), 1-25. doi:10.1145/3291052Lu, S.-K., Li, H.-P., & Miyase, K. (2018). Adaptive ECC Techniques for Reliability and Yield Enhancement of Phase Change Memory. 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS). doi:10.1109/iolts.2018.8474118Chen, J., Andjelkovic, M., Simevski, A., Li, Y., Skoncej, P., & Krstic, M. (2019). Design of SRAM-Based Low-Cost SEU Monitor for Self-Adaptive Multiprocessing Systems. 2019 22nd Euromicro Conference on Digital System Design (DSD). doi:10.1109/dsd.2019.00080Wang, X., Jiang, L., & Chakrabarty, K. (2020). LSTM-based Analysis of Temporally- and Spatially-Correlated Signatures for Intermittent Fault Detection. 2020 IEEE 38th VLSI Test Symposium (VTS). doi:10.1109/vts48691.2020.9107600Ebrahimi, H., & G. Kerkhoff, H. (2018). Intermittent Resistance Fault Detection at Board Level. 2018 IEEE 21st International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS). doi:10.1109/ddecs.2018.00031Ebrahimi, H., & Kerkhoff, H. G. (2020). A New Monitor Insertion Algorithm for Intermittent Fault Detection. 2020 IEEE European Test Symposium (ETS). doi:10.1109/ets48528.2020.9131563Hsiao, M. Y. (1970). A Class of Optimal Minimum Odd-weight-column SEC-DED Codes. IBM Journal of Research and Development, 14(4), 395-401. doi:10.1147/rd.144.0395Benso, A., & Prinetto, P. (Eds.). (2004). Fault Injection Techniques and Tools for Embedded Systems Reliability Evaluation. Frontiers in Electronic Testing. doi:10.1007/b105828Gracia, J., Saiz, L. J., Baraza, J. C., Gil, D., & Gil, P. J. (2008). Analysis of the influence of intermittent faults in a microcontroller. 2008 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems. doi:10.1109/ddecs.2008.4538761ZC702 Evaluation Board for the Zynq-7000 XC7Z020 SoChttps://www.xilinx.com/support/documentation/boards_and_kits/zc702_zvik/ug850-zc702-eval-bd.pd

    On Fault Resilient Network-on-Chip for Many Core Systems

    Get PDF
    Rapid scaling of transistor gate sizes has increased the density of on-chip integration and paved the way for heterogeneous many-core systems-on-chip, significantly improving the speed of on-chip processing. The design of the interconnection network of these complex systems is a challenging one and the network-on-chip (NoC) is now the accepted scalable and bandwidth efficient interconnect for multi-processor systems on-chip (MPSoCs). However, the performance enhancements of technology scaling come at the cost of reliability as on-chip components particularly the network-on-chip become increasingly prone to faults. In this thesis, we focus on approaches to deal with the errors caused by such faults. The results of these approaches are obtained not only via time-consuming cycle-accurate simulations but also by analytical approaches, allowing for faster and accurate evaluations, especially for larger networks. Redundancy is the general approach to deal with faults, the mode of which varies according to the type of fault. For the NoC, there exists a classification of faults into transient, intermittent and permanent faults. Transient faults appear randomly for a few cycles and may be caused by the radiation of particles. Intermittent faults are similar to transient faults, however, differing in the fact that they occur repeatedly at the same location, eventually leading to a permanent fault. Permanent faults by definition are caused by wires and transistors being permanently short or open. Generally, spatial redundancy or the use of redundant components is used for dealing with permanent faults. Temporal redundancy deals with failures by re-execution or by retransmission of data while information redundancy adds redundant information to the data packets allowing for error detection and correction. Temporal and information redundancy methods are useful when dealing with transient and intermittent faults. In this dissertation, we begin with permanent faults in NoC in the form of faulty links and routers. Our approach for spatial redundancy adds redundant links in the diagonal direction to the standard rectangular mesh topology resulting in the hexagonal and octagonal NoCs. In addition to redundant links, adaptive routing must be used to bypass faulty components. We develop novel fault-tolerant deadlock-free adaptive routing algorithms for these topologies based on the turn model without the use of virtual channels. Our results show that the hexagonal and octagonal NoCs can tolerate all 2-router and 3-router faults, respectively, while the mesh has been shown to tolerate all 1-router faults. To simplify the restricted-turn selection process for achieving deadlock freedom, we devised an approach based on the channel dependency matrix instead of the state-of-the-art Duato's method of observing the channel dependency graph for cycles. The approach is general and can be used for the turn selection process for any regular topology. We further use algebraic manipulations of the channel dependency matrix to analytically assess the fault resilience of the adaptive routing algorithms when affected by permanent faults. We present and validate this method for the 2D mesh and hexagonal NoC topologies achieving very high accuracy with a maximum error of 1%. The approach is very general and allows for faster evaluations as compared to the generally used cycle-accurate simulations. In comparison, existing works usually assume a limited number of faults to be able to analytically assess the network reliability. We apply the approach to evaluate the fault resilience of larger NoCs demonstrating the usefulness of the approach especially compared to cycle-accurate simulations. Finally, we concentrate on temporal and information redundancy techniques to deal with transient and intermittent faults in the router resulting in the dropping and hence loss of packets. Temporal redundancy is applied in the form of ARQ and retransmission of lost packets. Information redundancy is applied by the generation and transmission of redundant linear combinations of packets known as random linear network coding. We develop an analytic model for flexible evaluation of these approaches to determine the network performance parameters such as residual error rates and increased network load. The analytic model allows to evaluate larger NoCs and different topologies and to investigate the advantage of network coding compared to uncoded transmissions. We further extend the work with a small insight to the problem of secure communication over the NoC. Assuming large heterogeneous MPSoCs with components from third parties, the communication is subject to active attacks in the form of packet modification and drops in the NoC routers. Devising approaches to resolve these issues, we again formulate analytic models for their flexible and accurate evaluations, with a maximum estimation error of 7%

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

    Get PDF
    Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults
    • 

    corecore