1,192 research outputs found

    A Survey on the Best Choice for Modulus of Residue Code

    Get PDF
    Nowadays, the development of technology and the growing need for dense and complex chips have led chip industries to increase their attention on the circuit testability. Also, using the electronic chips in certain industries, such as the space industry, makes the design of fault tolerant circuits a challenging issue. Coding is one of the most suitable methods for error detection and correction. The residue code, as one of the best choices for error detection aims, is wildly used in large arithmetic circuits such as multiplier and also finds a wide range of applications in processors and digital filters. The modulus value in this technique directly effect on the area overhead parameter. A large area overhead is one of the most important disadvantages especially for testing the small circuits. The purpose of this paper is to study and investigate the best choice for residue code check base that is used for simple and small circuits such as a simple ripple carry adder. The performances are evaluated by applying stuck-at-faults and transition-faults by simulators. The efficiency is defined based on fault coverage and normalized area overhead. The results show that the modulus 3 with 95% efficiency provided the best result. Residue code with this modulus for checking a ripple carry adder, in comparison with duplex circuit, 30% improves the efficiency

    Self-Checking Ripple-Carry Adder with Ambipolar Silicon Nanowire FET

    Get PDF
    For the rapid adoption of new and aggressive technologies such as ambipolar Silicon NanoWire (SiNW), addressing fault-tolerance is necessary. Traditionally, transient fault detection implies large hardware overhead or performance decrease compared to permanent fault detection. In this paper, we focus on on-line testing and its application to ambipolar SiNW. We demonstrate on self - checking ripple - carry adder how ambipolar design style can help reduce the hardware overhead. When compared with equivalent CMOS process, ambipolar SiNW design shows a reduction in area of at least 56% (28%) with a decreased delay of 62% (6%) for Static (Transmission Gate) design style

    The Challenge of Detection and Diagnosis of Fugacious Hardware Faults in VLSI Designs

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-38789-0_7Current integration scales are increasing the number and types of faults that embedded systems must face. Traditional approaches focus on dealing with those transient and permanent faults that impact the state or output of systems, whereas little research has targeted those faults being logically, electrically or temporally masked -which we have named fugacious. A fast detection and precise diagnosis of faults occurrence, even if the provided service is unaffected, could be of invaluable help to determine, for instance, that systems are currently under the influence of environmental disturbances like radiation, suffering from wear-out, or being affected by an intermittent fault. Upon detection, systems may react to adapt the deployed fault tolerance mechanisms to the diagnosed problem. This paper explores these ideas evaluating challenges and requirements involved, and provides an outline of potential techniques to be applied.This work has been funded by Spanish Ministry of Economy ARENES project (TIN2012-38308-C02-01)Espinosa García, J.; Andrés Martínez, DD.; Ruiz, JC.; Gil, P. (2013). The Challenge of Detection and Diagnosis of Fugacious Hardware Faults in VLSI Designs. En Dependable Computing. Springer. 76-87. https://doi.org/10.1007/978-3-642-38789-0_7S7687Narayanan, V., Xie, Y.: Reliability concerns in embedded systems design. IEEE Computer 1(39), 118–120 (2006)Hannius, O., Karlsson, J.: Impact of soft errors in a jet engine controller. In: Ortmeier, F., Daniel, P. (eds.) SAFECOMP 2012. LNCS, vol. 7612, pp. 223–234. Springer, Heidelberg (2012)Borkar, S.: Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25(6), 10–16 (2005)JEDEC: Measurement and reporting of alpha particle and terrestrial cosmic ray-induced soft errors in semiconductor devices. JEDEC Standard JESD89A. JEDEC (2006)Gracia-Moran, J., Gil-Tomas, D., Saiz-Adalid, L.J., Baraza, J.C., Gil-Vicente, P.J.: Experimental validation of a fault tolerant microcomputer system against intermittent faults. In: DSN, pp. 413–418 (2010)Constantinescu, C.: Intermittent faults and effects on reliability of integrated circuits. In: Proceedings of the 2008 Annual Reliability and Maintainability Symposium, pp. 370–374. IEEE Computer Society, Washington, DC (2008)Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secur. Comput. 1, 11–33 (2004)Johnson, C., Holloway, C.: The dangers of failure masking in fault-tolerant software: Aspects of a recent in-flight upset event. In: 2007 2nd Institution of Engineering and Technology International Conference on System Safety, pp. 60–65 (October 2007)Bolchini, C., Salice, F., Sciuto, D.: Fault analysis for networks with concurrent error detection. IEEE Des. Test 15(4), 66–74 (1998)Goessel, M., Ocheretny, V., Sogomonyan, E., Marienfeld, D.: New Methods of Concurrent Checking (Frontiers in Electronic Testing), 1st edn. Springer Publishing Company, Incorporated (2008)Iyer, R.K., Rossetti, D.J.: A statistical load dependency model for cpu errors at slac. In: Twenty-Fifth International Symposium on Fault-Tolerant Computing, ‘Highlights from Twenty-Five Years’, p. 373 (June 1995)Dodd, P.E., Shaneyfelt, M.R., Felix, J.A., Schwank, J.R.: Production and propagation of single-event transients in high-speed digital logic ics. IEEE Transactions on Nuclear Science 51, 3278–3284 (2004)Nightingale, E.B., Douceur, J.R., Orgovan, V.: Cycles, cells and platters: an empirical analysisof hardware failures on a million consumer pcs. In: Proceedings of the Sixth Conference on Computer Systems, EuroSys 2011, pp. 343–356. ACM, New York (2011)Kimseng, K., Hoit, M., Tiwari, N., Pecht, M.: Physics-of-failure assessment of a cruise control module. Microelectronics Reliability 39(10), 1423–1444 (1999)Savir, J.: Detection of single intermittent faults in sequential circuits. IEEE Trans. Comput. 29(7), 673–678 (1980)Correcher, A., Garcia, E., Morant, F., Quiles, E., Rodriguez, L.: Intermittent failure dynamics characterization. IEEE Transactions on Reliability 61(3), 649–658 (2012)Sorensen, B., Kelly, G., Sajecki, A., Sorensen, P.: An analyzer for detecting intermittent faults in electronic devices. In: AUTOTESTCON 1994. IEEE Systems Readiness Technology Conference. ‘Cost Effective Support Into the Next Century’, Conference Proceedings, pp. 417–421 (September 1994)Sosnowski, J.: Transient fault tolerance in digital systems. IEEE Micro 14(1), 24–35 (1994)Bondavalli, A., Chiaradonna, S., Di Giandomenico, F., Grandoni, F.: Threshold-based mechanisms to discriminate transient from intermittent faults. IEEE Trans. Comput. 49(3), 230–245 (2000)Rashid, L., Pattabiraman, K., Gopalakrishnan, S.: Intermittent hardware errors and recovery: modelling and evaluation. In: International Conference on Quantitative Evaluation of Systems, QEST (2012)Touba, N.A., McCluskey, E.J.: Logic synthesis of multilevel circuits with concurrent error detection. IEEE Trans. CAD 16(7), 783–789 (1997)Nicolaidis, M., Manich, S., Figueras, J.: Achieving fault secureness in parity prediction arithmetic operators: General conditions and implementations. In: Proceedings of the 1996 European conference on Design and Test, EDTC 1996, pp. 186–193. IEEE Computer Society, Washington, DC (1996)Ko, S.B., Lo, J.C.: Efficient realization of parity prediction functions in fpgas. J. Electron. Test. 20(5), 489–499 (2004)D’Angelo, S., Sechi, G.R., Metra, C.: Transient and permanent fault diagnosis for fpga-based tmr systems. In: Proceedings of the 14th International Symposium on Defect and Fault-Tolerance in VLSI Systems, DFT 1999, pp. 330–338. IEEE Computer Society, Washington, DC (1999)Kim, C.: Detection and location of intermittent faults by monitoring carrier signal channel behavior of electrical interconnection system. In: Electric Ship Technologies Symposium, ESTS 2009, pp. 449–455. IEEE (April 2009

    A concurrent error detection based fault-tolerant 32 nm XOR-XNOR circuit implementation

    Get PDF
    As modern processors and semiconductor circuits move into 32 nm technologies and below, designers face the major problem of process variations. This problem makes designing VLSI circuits harder and harder, affects the circuit performance and introduces faults that can cause critical failures. Therefore, fault-tolerant design is required to obtain the necessary level of reliability and availability especially for safety-critical systems. Since XOR-XNOR circuits are basic building blocks in various digital and mixed systems, especially in arithmetic circuits, these gates should be designed such that they indicate any malfunction during normal operation. In fact, this property of verifying the results delivered by a circuit during its normal operation is called Concurrent Error Detection (CED). In this paper, we propose a CED based fault- tolerant XOR-XNOR circuit implementation. The proposed design is performed using the 32 nm process technology.published_or_final_versio

    Increasing the Dependability of VLSI Systems Through Early Detection of Fugacious Faults

    Full text link
    Technology advances provide a myriad of advantages for VLSI systems, but also increase the sensitivity of the combinational logic to different fault profiles. Shorter and shorter faults which up to date had been filtered, named as fugacious faults, require new attention as they are considered a feasible sign of warning prior to potential failures. Despite their increasing impact on modern VLSI systems, such faults are not largely considered today by the safety industry. Their early detection is however critical to enable an early evaluation of potential risks for the system and the subsequent deployment of suitable failure avoidance mechanisms. For instance, the early detection of fugacious faults will provide the necessary means to extend the mission time of a system thanks to the temporal avoidance of aging effects. Because classical detection mechanisms are not suited to cope with such fugacious faults, this paper proposes a method specifically designed to detect and diagnose them. Reported experiments will show the feasibility and interest of the proposal.This work has been funded by the Spanish Ministry of Economy ARENES project (TIN2012-38308-C02—01).Espinosa García, J.; Andrés Martínez, DD.; Gil, P. (2015). Increasing the Dependability of VLSI Systems Through Early Detection of Fugacious Faults. IEEE Computer Society - Conference Publishing Services (CPS). https://doi.org/10.1109/EDCC.2015.13

    Nanowire systems: technology and design (invited paper)

    Get PDF
    Nanosystems are large-scale integrated systems exploiting nanoelectronic devices. In this work, we consider double independent gate, vertically-stacked nanowire FETs with gate-all-around structures and typical diameter of 20-nm. These devices, which we have successfully fabricated and evaluated, control the ambipolar behavior of the nanostructure by selectively enabling one type of carriers. These transistors work as switches with electrically-programmable polarity and thus realize an exclusive or operation. The intrinsic higher expressive power of these FETs, as compared to standard CMOS, enables us to realize more efficient library cells, which we organize as tiles to realize circuits by regular arrays. This article surveys both the technology for double independent gate FETs as well as physical and logic design tools to realize digital systems with this fabrication technology

    Tamper-Resistant Arithmetic for Public-Key Cryptography

    Get PDF
    Cryptographic hardware has found many uses in many ubiquitous and pervasive security devices with a small form factor, e.g. SIM cards, smart cards, electronic security tokens, and soon even RFIDs. With applications in banking, telecommunication, healthcare, e-commerce and entertainment, these devices use cryptography to provide security services like authentication, identification and confidentiality to the user. However, the widespread adoption of these devices into the mass market, and the lack of a physical security perimeter have increased the risk of theft, reverse engineering, and cloning. Despite the use of strong cryptographic algorithms, these devices often succumb to powerful side-channel attacks. These attacks provide a motivated third party with access to the inner workings of the device and therefore the opportunity to circumvent the protection of the cryptographic envelope. Apart from passive side-channel analysis, which has been the subject of intense research for over a decade, active tampering attacks like fault analysis have recently gained increased attention from the academic and industrial research community. In this dissertation we address the question of how to protect cryptographic devices against this kind of attacks. More specifically, we focus our attention on public key algorithms like elliptic curve cryptography and their underlying arithmetic structure. In our research we address challenges such as the cost of implementation, the level of protection, and the error model in an adversarial situation. The approaches that we investigated all apply concepts from coding theory, in particular the theory of cyclic codes. This seems intuitive, since both public key cryptography and cyclic codes share finite field arithmetic as a common foundation. The major contributions of our research are (a) a generalization of cyclic codes that allow embedding of finite fields into redundant rings under a ring homomorphism, (b) a new family of non-linear arithmetic residue codes with very high error detection probability, (c) a set of new low-cost arithmetic primitives for optimal extension field arithmetic based on robust codes, and (d) design techniques for tamper resilient finite state machines

    Resiliency Mechanisms for In-Memory Column Stores

    Get PDF
    The key objective of database systems is to reliably manage data, while high query throughput and low query latency are core requirements. To date, database research activities mostly concentrated on the second part. However, due to the constant shrinking of transistor feature sizes, integrated circuits become more and more unreliable and transient hardware errors in the form of multi-bit flips become more and more prominent. In a more recent study (2013), in a large high-performance cluster with around 8500 nodes, a failure rate of 40 FIT per DRAM device was measured. For their system, this means that every 10 hours there occurs a single- or multi-bit flip, which is unacceptably high for enterprise and HPC scenarios. Causes can be cosmic rays, heat, or electrical crosstalk, with the latter being exploited actively through the RowHammer attack. It was shown that memory cells are more prone to bit flips than logic gates and several surveys found multi-bit flip events in main memory modules of today's data centers. Due to the shift towards in-memory data management systems, where all business related data and query intermediate results are kept solely in fast main memory, such systems are in great danger to deliver corrupt results to their users. Hardware techniques can not be scaled to compensate the exponentially increasing error rates. In other domains, there is an increasing interest in software-based solutions to this problem, but these proposed methods come along with huge runtime and/or storage overheads. These are unacceptable for in-memory data management systems. In this thesis, we investigate how to integrate bit flip detection mechanisms into in-memory data management systems. To achieve this goal, we first build an understanding of bit flip detection techniques and select two error codes, AN codes and XOR checksums, suitable to the requirements of in-memory data management systems. The most important requirement is effectiveness of the codes to detect bit flips. We meet this goal through AN codes, which exhibit better and adaptable error detection capabilities than those found in today's hardware. The second most important goal is efficiency in terms of coding latency. We meet this by introducing a fundamental performance improvements to AN codes, and by vectorizing both chosen codes' operations. We integrate bit flip detection mechanisms into the lowest storage layer and the query processing layer in such a way that the remaining data management system and the user can stay oblivious of any error detection. This includes both base columns and pointer-heavy index structures such as the ubiquitous B-Tree. Additionally, our approach allows adaptable, on-the-fly bit flip detection during query processing, with only very little impact on query latency. AN coding allows to recode intermediate results with virtually no performance penalty. We support our claims by providing exhaustive runtime and throughput measurements throughout the whole thesis and with an end-to-end evaluation using the Star Schema Benchmark. To the best of our knowledge, we are the first to present such holistic and fast bit flip detection in a large software infrastructure such as in-memory data management systems. Finally, most of the source code fragments used to obtain the results in this thesis are open source and freely available.:1 INTRODUCTION 1.1 Contributions of this Thesis 1.2 Outline 2 PROBLEM DESCRIPTION AND RELATED WORK 2.1 Reliable Data Management on Reliable Hardware 2.2 The Shift Towards Unreliable Hardware 2.3 Hardware-Based Mitigation of Bit Flips 2.4 Data Management System Requirements 2.5 Software-Based Techniques For Handling Bit Flips 2.5.1 Operating System-Level Techniques 2.5.2 Compiler-Level Techniques 2.5.3 Application-Level Techniques 2.6 Summary and Conclusions 3 ANALYSIS OF CODING TECHNIQUES 3.1 Selection of Error Codes 3.1.1 Hamming Coding 3.1.2 XOR Checksums 3.1.3 AN Coding 3.1.4 Summary and Conclusions 3.2 Probabilities of Silent Data Corruption 3.2.1 Probabilities of Hamming Codes 3.2.2 Probabilities of XOR Checksums 3.2.3 Probabilities of AN Codes 3.2.4 Concrete Error Models 3.2.5 Summary and Conclusions 3.3 Throughput Considerations 3.3.1 Test Systems Descriptions 3.3.2 Vectorizing Hamming Coding 3.3.3 Vectorizing XOR Checksums 3.3.4 Vectorizing AN Coding 3.3.5 Summary and Conclusions 3.4 Comparison of Error Codes 3.4.1 Effectiveness 3.4.2 Efficiency 3.4.3 Runtime Adaptability 3.5 Performance Optimizations for AN Coding 3.5.1 The Modular Multiplicative Inverse 3.5.2 Faster Softening 3.5.3 Faster Error Detection 3.5.4 Comparison to Original AN Coding 3.5.5 The Multiplicative Inverse Anomaly 3.6 Summary 4 BIT FLIP DETECTING STORAGE 4.1 Column Store Architecture 4.1.1 Logical Data Types 4.1.2 Storage Model 4.1.3 Data Representation 4.1.4 Data Layout 4.1.5 Tree Index Structures 4.1.6 Summary 4.2 Hardened Data Storage 4.2.1 Hardened Physical Data Types 4.2.2 Hardened Lightweight Compression 4.2.3 Hardened Data Layout 4.2.4 UDI Operations 4.2.5 Summary and Conclusions 4.3 Hardened Tree Index Structures 4.3.1 B-Tree Verification Techniques 4.3.2 Justification For Further Techniques 4.3.3 The Error Detecting B-Tree 4.4 Summary 5 BIT FLIP DETECTING QUERY PROCESSING 5.1 Column Store Query Processing 5.2 Bit Flip Detection Opportunities 5.2.1 Early Onetime Detection 5.2.2 Late Onetime Detection 5.2.3 Continuous Detection 5.2.4 Miscellaneous Processing Aspects 5.2.5 Summary and Conclusions 5.3 Hardened Intermediate Results 5.3.1 Materialization of Hardened Intermediates 5.3.2 Hardened Bitmaps 5.4 Summary 6 END-TO-END EVALUATION 6.1 Prototype Implementation 6.1.1 AHEAD Architecture 6.1.2 Diversity of Physical Operators 6.1.3 One Concrete Operator Realization 6.1.4 Summary and Conclusions 6.2 Performance of Individual Operators 6.2.1 Selection on One Predicate 6.2.2 Selection on Two Predicates 6.2.3 Join Operators 6.2.4 Grouping and Aggregation 6.2.5 Delta Operator 6.2.6 Summary and Conclusions 6.3 Star Schema Benchmark Queries 6.3.1 Query Runtimes 6.3.2 Improvements Through Vectorization 6.3.3 Storage Overhead 6.3.4 Summary and Conclusions 6.4 Error Detecting B-Tree 6.4.1 Single Key Lookup 6.4.2 Key Value-Pair Insertion 6.5 Summary 7 SUMMARY AND CONCLUSIONS 7.1 Future Work A APPENDIX A.1 List of Golden As A.2 More on Hamming Coding A.2.1 Code examples A.2.2 Vectorization BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLES LIST OF LISTINGS LIST OF ACRONYMS LIST OF SYMBOLS LIST OF DEFINITION

    The 1992 4th NASA SERC Symposium on VLSI Design

    Get PDF
    Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme
    • …
    corecore