3,642 research outputs found

    Correct and Control Complex IoT Systems: Evaluation of a Classification for System Anomalies

    Full text link
    In practice there are deficiencies in precise interteam communications about system anomalies to perform troubleshooting and postmortem analysis along different teams operating complex IoT systems. We evaluate the quality in use of an adaptation of IEEE Std. 1044-2009 with the objective to differentiate the handling of fault detection and fault reaction from handling of defect and its options for defect correction. We extended the scope of IEEE Std. 1044-2009 from anomalies related to software only to anomalies related to complex IoT systems. To evaluate the quality in use of our classification a study was conducted at Robert Bosch GmbH. We applied our adaptation to a postmortem analysis of an IoT solution and evaluated the quality in use by conducting interviews with three stakeholders. Our adaptation was effectively applied and interteam communications as well as iterative and inductive learning for product improvement were enhanced. Further training and practice are required.Comment: Submitted to QRS 2020 (IEEE Conference on Software Quality, Reliability and Security

    Combined Time and Information Redundancy for SEU-Tolerance in Energy-Efficient Real-Time Systems

    No full text
    Recently the trade-off between energy consumption and fault-tolerance in real-time systems has been highlighted. These works have focused on dynamic voltage scaling (DVS) to reduce dynamic energy dissipation and on time redundancy to achieve transient-fault tolerance. While the time redundancy technique exploits the available slack time to increase the fault-tolerance by performing recovery executions, DVS exploits slack time to save energy. Therefore we believe there is a resource conflict between the time-redundancy technique and DVS. The first aim of this paper is to propose the usage of information redundancy to solve this problem. We demonstrate through analytical and experimental studies that it is possible to achieve both higher transient fault-tolerance (tolerance to single event upsets (SEU)) and less energy using a combination of information and time redundancy when compared with using time redundancy alone. The second aim of this paper is to analyze the interplay of transient-fault tolerance (SEU-tolerance) and adaptive body biasing (ABB) used to reduce static leakage energy, which has not been addressed in previous studies. We show that the same technique (i.e. the combination of time and information redundancy) is applicable to ABB-enabled systems and provides more advantages than time redundancy alone

    Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

    Get PDF
    Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified

    RescueSNN: Enabling Reliable Executions on Spiking Neural Network Accelerators under Permanent Faults

    Full text link
    To maximize the performance and energy efficiency of Spiking Neural Network (SNN) processing on resource-constrained embedded systems, specialized hardware accelerators/chips are employed. However, these SNN chips may suffer from permanent faults which can affect the functionality of weight memory and neuron behavior, thereby causing potentially significant accuracy degradation and system malfunctioning. Such permanent faults may come from manufacturing defects during the fabrication process, and/or from device/transistor damages (e.g., due to wear out) during the run-time operation. However, the impact of permanent faults in SNN chips and the respective mitigation techniques have not been thoroughly investigated yet. Toward this, we propose RescueSNN, a novel methodology to mitigate permanent faults in the compute engine of SNN chips without requiring additional retraining, thereby significantly cutting down the design time and retraining costs, while maintaining the throughput and quality. The key ideas of our RescueSNN methodology are (1) analyzing the characteristics of SNN under permanent faults; (2) leveraging this analysis to improve the SNN fault-tolerance through effective fault-aware mapping (FAM); and (3) devising lightweight hardware enhancements to support FAM. Our FAM technique leverages the fault map of SNN compute engine for (i) minimizing weight corruption when mapping weight bits on the faulty memory cells, and (ii) selectively employing faulty neurons that do not cause significant accuracy degradation to maintain accuracy and throughput, while considering the SNN operations and processing dataflow. The experimental results show that our RescueSNN improves accuracy by up to 80% while maintaining the throughput reduction below 25% in high fault rate (e.g., 0.5 of the potential fault locations), as compared to running SNNs on the faulty chip without mitigation.Comment: Accepted for publication at Frontiers in Neuroscience - Section Neuromorphic Engineerin

    Study of fault-tolerant software technology

    Get PDF
    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

    An Assessment to Benchmark the Seismic Performance of a Code-Conforming Reinforced-Concrete Moment-Frame Building

    Get PDF
    This report describes a state-of-the-art performance-based earthquake engineering methodology that is used to assess the seismic performance of a four-story reinforced concrete (RC) office building that is generally representative of low-rise office buildings constructed in highly seismic regions of California. This “benchmark” building is considered to be located at a site in the Los Angeles basin, and it was designed with a ductile RC special moment-resisting frame as its seismic lateral system that was designed according to modern building codes and standards. The building’s performance is quantified in terms of structural behavior up to collapse, structural and nonstructural damage and associated repair costs, and the risk of fatalities and their associated economic costs. To account for different building configurations that may be designed in practice to meet requirements of building size and use, eight structural design alternatives are used in the performance assessments. Our performance assessments account for important sources of uncertainty in the ground motion hazard, the structural response, structural and nonstructural damage, repair costs, and life-safety risk. The ground motion hazard characterization employs a site-specific probabilistic seismic hazard analysis and the evaluation of controlling seismic sources (through disaggregation) at seven ground motion levels (encompassing return periods ranging from 7 to 2475 years). Innovative procedures for ground motion selection and scaling are used to develop acceleration time history suites corresponding to each of the seven ground motion levels. Structural modeling utilizes both “fiber” models and “plastic hinge” models. Structural modeling uncertainties are investigated through comparison of these two modeling approaches, and through variations in structural component modeling parameters (stiffness, deformation capacity, degradation, etc.). Structural and nonstructural damage (fragility) models are based on a combination of test data, observations from post-earthquake reconnaissance, and expert opinion. Structural damage and repair costs are modeled for the RC beams, columns, and slabcolumn connections. Damage and associated repair costs are considered for some nonstructural building components, including wallboard partitions, interior paint, exterior glazing, ceilings, sprinkler systems, and elevators. The risk of casualties and the associated economic costs are evaluated based on the risk of structural collapse, combined with recent models on earthquake fatalities in collapsed buildings and accepted economic modeling guidelines for the value of human life in loss and cost-benefit studies. The principal results of this work pertain to the building collapse risk, damage and repair cost, and life-safety risk. These are discussed successively as follows. When accounting for uncertainties in structural modeling and record-to-record variability (i.e., conditional on a specified ground shaking intensity), the structural collapse probabilities of the various designs range from 2% to 7% for earthquake ground motions that have a 2% probability of exceedance in 50 years (2475 years return period). When integrated with the ground motion hazard for the southern California site, the collapse probabilities result in mean annual frequencies of collapse in the range of [0.4 to 1.4]x10 -4 for the various benchmark building designs. In the development of these results, we made the following observations that are expected to be broadly applicable: (1) The ground motions selected for performance simulations must consider spectral shape (e.g., through use of the epsilon parameter) and should appropriately account for correlations between motions in both horizontal directions; (2) Lower-bound component models, which are commonly used in performance-based assessment procedures such as FEMA 356, can significantly bias collapse analysis results; it is more appropriate to use median component behavior, including all aspects of the component model (strength, stiffness, deformation capacity, cyclic deterioration, etc.); (3) Structural modeling uncertainties related to component deformation capacity and post-peak degrading stiffness can impact the variability of calculated collapse probabilities and mean annual rates to a similar degree as record-to-record variability of ground motions. Therefore, including the effects of such structural modeling uncertainties significantly increases the mean annual collapse rates. We found this increase to be roughly four to eight times relative to rates evaluated for the median structural model; (4) Nonlinear response analyses revealed at least six distinct collapse mechanisms, the most common of which was a story mechanism in the third story (differing from the multi-story mechanism predicted by nonlinear static pushover analysis); (5) Soil-foundation-structure interaction effects did not significantly affect the structural response, which was expected given the relatively flexible superstructure and stiff soils. The potential for financial loss is considerable. Overall, the calculated expected annual losses (EAL) are in the range of 52,000to52,000 to 97,000 for the various code-conforming benchmark building designs, or roughly 1% of the replacement cost of the building (8.8M).Theselossesaredominatedbytheexpectedrepaircostsofthewallboardpartitions(includinginteriorpaint)andbythestructuralmembers.Lossestimatesaresensitivetodetailsofthestructuralmodels,especiallytheinitialstiffnessofthestructuralelements.Lossesarealsofoundtobesensitivetostructuralmodelingchoices,suchasignoringthetensilestrengthoftheconcrete(40EAL)orthecontributionofthegravityframestooverallbuildingstiffnessandstrength(15changeinEAL).Althoughthereareanumberoffactorsidentifiedintheliteratureaslikelytoaffecttheriskofhumaninjuryduringseismicevents,thecasualtymodelinginthisstudyfocusesonthosefactors(buildingcollapse,buildingoccupancy,andspatiallocationofbuildingoccupants)thatdirectlyinformthebuildingdesignprocess.Theexpectedannualnumberoffatalitiesiscalculatedforthebenchmarkbuilding,assumingthatanearthquakecanoccuratanytimeofanydaywithequalprobabilityandusingfatalityprobabilitiesconditionedonstructuralcollapseandbasedonempiricaldata.Theexpectedannualnumberoffatalitiesforthecodeconformingbuildingsrangesbetween0.05102and0.21102,andisequalto2.30102foranoncodeconformingdesign.Theexpectedlossoflifeduringaseismiceventisperhapsthedecisionvariablethatownersandpolicymakerswillbemostinterestedinmitigating.Thefatalityestimationcarriedoutforthebenchmarkbuildingprovidesamethodologyforcomparingthisimportantvalueforvariousbuildingdesigns,andenablesinformeddecisionmakingduringthedesignprocess.Theexpectedannuallossassociatedwithfatalitiescausedbybuildingearthquakedamageisestimatedbyconvertingtheexpectedannualnumberoffatalitiesintoeconomicterms.Assumingthevalueofahumanlifeis8.8M). These losses are dominated by the expected repair costs of the wallboard partitions (including interior paint) and by the structural members. Loss estimates are sensitive to details of the structural models, especially the initial stiffness of the structural elements. Losses are also found to be sensitive to structural modeling choices, such as ignoring the tensile strength of the concrete (40% change in EAL) or the contribution of the gravity frames to overall building stiffness and strength (15% change in EAL). Although there are a number of factors identified in the literature as likely to affect the risk of human injury during seismic events, the casualty modeling in this study focuses on those factors (building collapse, building occupancy, and spatial location of building occupants) that directly inform the building design process. The expected annual number of fatalities is calculated for the benchmark building, assuming that an earthquake can occur at any time of any day with equal probability and using fatality probabilities conditioned on structural collapse and based on empirical data. The expected annual number of fatalities for the code-conforming buildings ranges between 0.05*10 -2 and 0.21*10 -2 , and is equal to 2.30*10 -2 for a non-code conforming design. The expected loss of life during a seismic event is perhaps the decision variable that owners and policy makers will be most interested in mitigating. The fatality estimation carried out for the benchmark building provides a methodology for comparing this important value for various building designs, and enables informed decision making during the design process. The expected annual loss associated with fatalities caused by building earthquake damage is estimated by converting the expected annual number of fatalities into economic terms. Assuming the value of a human life is 3.5M, the fatality rate translates to an EAL due to fatalities of 3,500to3,500 to 5,600 for the code-conforming designs, and 79,800forthenoncodeconformingdesign.ComparedtotheEALduetorepaircostsofthecodeconformingdesigns,whichareontheorderof79,800 for the non-code conforming design. Compared to the EAL due to repair costs of the code-conforming designs, which are on the order of 66,000, the monetary value associated with life loss is small, suggesting that the governing factor in this respect will be the maximum permissible life-safety risk deemed by the public (or its representative government) to be appropriate for buildings. Although the focus of this report is on one specific building, it can be used as a reference for other types of structures. This report is organized in such a way that the individual core chapters (4, 5, and 6) can be read independently. Chapter 1 provides background on the performance-based earthquake engineering (PBEE) approach. Chapter 2 presents the implementation of the PBEE methodology of the PEER framework, as applied to the benchmark building. Chapter 3 sets the stage for the choices of location and basic structural design. The subsequent core chapters focus on the hazard analysis (Chapter 4), the structural analysis (Chapter 5), and the damage and loss analyses (Chapter 6). Although the report is self-contained, readers interested in additional details can find them in the appendices

    On the functional test of the BTB logic in pipelined and superscalar processors

    Get PDF
    Electronic systems are increasingly used for safety-critical applications, where the effects of faults must be taken under control and hopefully avoided. For this purpose, test of manufactured devices is particularly important, both at the end of the production line and during the operational phase. This paper describes a method to test the logic implementing the Branch Prediction Unit in pipelined and superscalar processors when this follows the Branch Target Buffer (BTB) architecture; the proposed approach is functional, i.e., it is based on forcing the processor to execute a suitably devised test program and observing the produced results. Experimental results are provided on the DLX processor, showing that the method can achieve a high value of stuck-at fault coverage while also testing the memory in the BT
    corecore