4 research outputs found
Development of low-overhead soft error mitigation technique for safety critical neural networks applications
Deep Neural Networks (DNNs) have been widely applied in healthcare applications. DNN-based healthcare applications are safety-critical systems that require highreliability implementation due to a high risk of human death or injury in case of malfunction. Several DNN accelerators are used to execute these DNN models, and GPUs are currently the most prominent and the dominated DNN accelerators. However, GPUs are prone to soft errors that dramatically impact the GPU behaviors; such error may corrupt data values or logic operations, which result in Silent Data Corruption (SDC). The SDC propagates from the physical level to the application level (SDC that occurs in hardware GPUs’ components) results in misclassification of objects in DNN models, leading to disastrous consequences. Food and Drug Administration (FDA) reported that 1078 of the adverse events (10.1%) were unintended errors (i.e., soft errors) encountered, including 52 injuries and two deaths. Several traditional techniques have been proposed to protect electronic devices from soft errors by replicating the DNN models. However, these techniques cause significant overheads of area, performance, and energy, making them challenging to implement in healthcare systems that have strict deadlines. To address this issue, this study developed a Selective Mitigation Technique based on the standard Triple Modular Redundancy (S-MTTM-R) to determine the model’s vulnerable parts, distinguishing Malfunction and Light-Malfunction errors. A comprehensive vulnerability analysis was performed using a SASSIFI fault injector at the CNN AlexNet and DenseNet201 models: layers, kernels, and instructions to show both models’ resilience and identify the most vulnerable portions and harden them by injecting them while implemented on NVIDIA’s GPUs. The experimental results showed that S-MTTM-R achieved a significant improvement in error masking. No-Malfunction have been improved from 54.90%, 67.85%, and 59.36% to 62.80%, 82.10%, and 80.76% in the three modes RF, IOA, and IOV, respectively for AlexNet. For DenseNet, NoMalfunction have been improved from 43.70%, 67.70%, and 54.68% to 59.90%, 84.75%, and 83.07% in the three modes RF, IOA, and IOV, respectively. Importantly, S-MTTMR decreased the percentage of errors that case misclassification (Malfunction) from 3.70% to 0.38% and 5.23% to 0.23%, for AlexNet and DenseNet, respectively. The performance analysis results showed that the S-MTTM-R achieved lower overhead compared to the well-known protection techniques: Algorithm-Based Fault Tolerance (ABFT), Double Modular Redundancy (DMR), and Triple Modular Redundancy (TMR). In light of these results, the study revealed strong evidence that the developed S-MTTMR was successfully mitigated the soft errors for the DNNs model on GPUs with lowoverheads in energy, performance, and area indicated a remarkable improvement in the healthcare domains’ model reliability
FireNN: Neural Networks Reliability Evaluation on Hybrid Platforms
The growth of neural networks complexity has led to adopt of hardware-accelerators to cope with the computational power required by the new architectures. The possibility to adapt the network for different platforms enhanced the interests of safety-critical applications. The reliability evaluation of neural networks are still premature and requires platforms to measure the safety standards required by mission-critical applications. For this reason, the interest in studying the reliability of neural networks is growing. We propose a new approach for evaluating the resiliency of neural networks by using hybrid platforms. The approach relies on the reconfigurable hardware for emulating the target hardware platform and performing the fault injection process. The main advantage of the proposed approach is to involve the on-hardware execution of the neural network in the reliability analysis without any intrusiveness into the network algorithm and addressing specific fault models. The implementation of FireNN, the platform based on the proposed approach, is described in the paper. Experimental analyses are performed using fault injection on AlexNet. The analyses are carried out using the FireNN platform and the results are compared with the outcome of traditional software-level evaluations. Results are discussed considering the insight into the hardware level achieved using FireNN
Novel Validation Techniques for Autonomous Vehicles
The automotive industry is facing challenges in producing electrical, connected, and autonomous vehicles. Even if these challenges are, from a technical point of view, independent from each other, the market and regulatory bodies require them to be developed and integrated simultaneously.
The development of autonomous vehicles implies the development of highly dependable systems. This is a multidisciplinary activity involving knowledge from robotics, computer science, electrical and mechanical engineering, psychology, social studies, and ethics.
Nowadays, many Advanced Driver Assistance Systems (ADAS), like Emergency Braking System, Lane Keep Assistant, and Park Assist, are available. Newer luxury cars can drive by themselves on highways or park automatically, but the end goal is to develop completely autonomous driving vehicles, able to go by themselves, without needing human interventions in any situation.
The more vehicles become autonomous, the greater the difficulty in keeping them reliable. It enhances the challenges in terms of development processes since their misbehaviors can lead to catastrophic consequences and, differently from the past, there is no more a human driver to mitigate the effects of erroneous behaviors.
Primary threats to dependability come from three sources: misuse from the drivers, design systematic errors, and random hardware failures.
These safety threats are addressed under various aspects, considering the particular type of item to be designed. In particular, for the sake of this work, we analyze those related to Functional Safety (FuSa), viewed as the ability of a system to react on time and in the proper way to the external environment.
From the technological point of view, these behaviors are implemented by electrical and electronic items.
Various standards to achieve FuSa have been released over the years. The first, released in 1998, was the IEC 61508. Its last version is the one released in 2010. This standard defines mainly:
• a Functional Safety Management System (FSMS);
• methods to determine a Safety Integrated Level (SIL);
• methods to determine the probability of failures.
To adapt the IEC61508 to the automotive industry’s peculiarity, a newer standard, the ISO26262, was released in 2011 then updated in 2018.
This standard provides guidelines about FSMS, called in this case Safety Lifecycle, describing how to develop software and hardware components suitable for functional safety. It also provides a different way to compute the SIL, called in this case Automotive SIL (ASIL), allowing us to consider the average driver’s abilities to control the vehicle in case of failures. Moreover, it describes a way to determine the probability of random hardware failures through Failure Mode, Effects, and Diagnostic Analysis (FMEDA).
This dissertation contains contributions to three topics:
• random hardware failures mitigation;
• improvementoftheISO26262HazardAnalysisandRiskAssessment(HARA); • real-time verification of the embedded software.
As the main contribution of this dissertation, I address the safety threats due to random hardware failures (RHFs).
For this purpose, I propose a novel simulation-based approach to aid the Failure Mode, Effects, and Diagnostic Analysis (FMEDA) required by the ISO26262 standard. Thanks to a SPICE-level model of the item, and the adoption of fault injection techniques, it is possible to simulate its behaviors obtaining useful information to classify the various failure modes. The proposed approach evolved from a mere simulation of the item, allowing only an item-level failure mode classification up to a vehicle-level analysis. The propagation of the failure modes’ effects on the whole vehicle enables us to assess the impacts on the vehicle’s drivability, improving the quality of the classifications. It can be advantageous where it is difficult to predict how the item-level misbehaviors propagate to the vehicle level, as in the case of a virtual differential gear or the mobility system of a robot. It has been chosen since it can be considered similar to the novel light vehicles, such as electric scooters, that are becoming more and more popular. Moreover, my research group has complete access to its design since it is realized by our university’s DIANA students’ team. When a SPICE-level simulation is too long to be performed, or it is not possible to develop a complete model of the item due to intellectual property protection rules, it is possible to aid this process through behavioral models of the item. A simulation of this kind has been performed on a mobile robotic system. Behavioral models of the electronic components were used, alongside mechanical simulations, to assess the software failure mitigation capabilities.
Another contribution has been obtained by modifying the main one. The idea was to make it possible to aid also the Hazard Analysis and Risk Assessment (HARA).
This assessment is performed during the concept phase, so before starting to design the item implementation. Its goal is to determine the hazards involved in the item functionality and their associated levels of risk. The end goal of this phase is a list of safety goals. For each one of these safety goals, an ASIL has to be determined. Since HARA relies only on designers expertise and knowledge, it lacks in objectivity and repeatability.
Thanks to the simulation results, it is possible to predict the effects of the failures on the vehicle’s drivability, allowing us to improve the severity and controllability assessment, thus improving the objectivity. Moreover, since simulation conditions can be stored, it is possible, at any time, to recheck the results and to add new scenarios, improving the repeatability.
The third group of contributions is about the real-time verification of embedded software. Through Hardware-In-the-Loop (HIL), a software integration verification has been performed to test a fundamental automotive component, mixed-criticality applications, and multi-agent robots.
The first of these contributions is about real-time tests on Body Control Modules (BCM). These modules manage various electronic accessories in the vehicle’s body, like power windows and mirrors, air conditioning, immobilizer, central locking. The main characteristics of BCMs are the communications with other embedded computers via the car’s vehicle bus (Controller Area Network) and to have a high number (hundreds) of low-speed I/Os.
As the second contribution, I propose a methodology to assess the error recovery system’s effects on mixed-criticality applications regarding deadline misses. The system runs two tasks: a critical airplane longitudinal control and a non-critical image compression algorithm. I start by presenting the approach on a benchmark application containing an instrumented bug into the lower criticality task; then, we improved it by injecting random errors inside the lower criticality task’s memory space through a debugger. In the latter case, thanks to the HIL, it is possible to pause the time domain simulation when the debugger operates and resume it once the injection is complete. In this way, it is possible to interact with the target without interfering with the simulation results, combining a full control of the target with an accurate time-domain assessment.
The last contribution of this third group is about a methodology to verify, on multi-agent robots, the synchronization between two agents in charge to move the end effector of a delta robot: the correct position and speed of the end effector at any time is strongly affected by a loss of synchronization.
The last two contributions may seem unrelated to the automotive industry, but interest in these applications is gaining. Mixed-criticality systems allow reducing the number of ECUs inside cars (for cost reduction), while the multi-agent approach is helpful to improve the cooperation of the connected cars with respect to other vehicles and the infrastructure.
The fourth contribution, contained in the appendix, is about a machine learning application to improve the social acceptance of autonomous vehicles.
The idea is to improve the comfort of the passengers by recognizing their emotions. I started with the idea to modify the vehicle’s driving style based on a real-time emotions recognition system but, due to the difficulties of performing such operations in an experimental setup, I move to analyze them offline. The emotions are determined on volunteers’ facial expressions recorded while viewing 3D representa- tions showing different calibrations. Thanks to the passengers’ emotional responses, it is possible to choose the better calibration from the comfort point of view
Novel Validation Techniques for Autonomous Vehicles
L'abstract è presente nell'allegato / the abstract is in the attachmen