55,115 research outputs found
Process of designing robust, dependable, safe and secure software for medical devices: Point of care testing device as a case study
This article has been made available through the Brunel Open Access Publishing Fund.Copyright © 2013 Sivanesan Tulasidas et al. This paper presents a holistic methodology for the design of medical device software, which encompasses of a new way of eliciting requirements, system design process, security design guideline, cloud architecture design, combinatorial testing process and agile project management. The paper uses point of care diagnostics as a case study where the software and hardware must be robust, reliable to provide accurate diagnosis of diseases. As software and software intensive systems are becoming increasingly complex, the impact of failures can lead to significant property damage, or damage to the environment. Within the medical diagnostic device software domain such failures can result in misdiagnosis leading to clinical complications and in some cases death. Software faults can arise due to the interaction among the software, the hardware, third party software and the operating environment. Unanticipated environmental changes and latent coding errors lead to operation faults despite of the fact that usually a significant effort has been expended in the design, verification and validation of the software system. It is becoming increasingly more apparent that one needs to adopt different approaches, which will guarantee that a complex software system meets all safety, security, and reliability requirements, in addition to complying with standards such as IEC 62304. There are many initiatives taken to develop safety and security critical systems, at different development phases and in different contexts, ranging from infrastructure design to device design. Different approaches are implemented to design error free software for safety critical systems. By adopting the strategies and processes presented in this paper one can overcome the challenges in developing error free software for medical devices (or safety critical systems).Brunel Open Access Publishing Fund
Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study
This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications
Recommended from our members
Fault tolerance in super-scalar and VLIW processors
In this paper, we present a method for utilizing the spare capacity in super-scalar and very long instruction word (VLIW) processors to tolerate functional unit failures. Unlike previous work that was primarily interested in detection of transient faults, we are concerned with more permanent and/or intermittent faults which necessitate processor reconfiguration. Our method utilizes the VLIW compiler or the superscalar scheduler to insert redundant operations whenever idle functional units exist. The results of these redundant operations are used to detect and diagnose functional unit failures. For super-scalar processors, the scheduler can then utilize this information to ensure that operations are performed only on non-faulty units. In VLIW processors, this is equivalent to recompiling the code to run on the remaining non-faulty functional units. Since in certain applications, recompilation may not be possible, we consider two alternative reconfiguration strategies for VLIW processors. These strategies sacrifice storage space and execution time, respectively, in order to reconfigure without recompiling. We present Markov models that describe the behavior of processors using these different approaches and we evaluate their reliabilities. The results show that, while super-scalar and VLIW with recompilation provide the highest reliability, all proposed strategies significantly increase reliability over that of an unprotected processor
A Pattern Language for High-Performance Computing Resilience
High-performance computing systems (HPC) provide powerful capabilities for
modeling, simulation, and data analytics for a broad class of computational
problems. They enable extreme performance of the order of quadrillion
floating-point arithmetic calculations per second by aggregating the power of
millions of compute, memory, networking and storage components. With the
rapidly growing scale and complexity of HPC systems for achieving even greater
performance, ensuring their reliable operation in the face of system
degradations and failures is a critical challenge. System fault events often
lead the scientific applications to produce incorrect results, or may even
cause their untimely termination. The sheer number of components in modern
extreme-scale HPC systems and the complex interactions and dependencies among
the hardware and software components, the applications, and the physical
environment makes the design of practical solutions that support fault
resilience a complex undertaking. To manage this complexity, we developed a
methodology for designing HPC resilience solutions using design patterns. We
codified the well-known techniques for handling faults, errors and failures
that have been devised, applied and improved upon over the past three decades
in the form of design patterns. In this paper, we present a pattern language to
enable a structured approach to the development of HPC resilience solutions.
The pattern language reveals the relations among the resilience patterns and
provides the means to explore alternative techniques for handling a specific
fault model that may have different efficiency and complexity characteristics.
Using the pattern language enables the design and implementation of
comprehensive resilience solutions as a set of interconnected resilience
patterns that can be instantiated across layers of the system stack.Comment: Proceedings of the 22nd European Conference on Pattern Languages of
Program
A fault-tolerant intelligent robotic control system
This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines
Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems
This paper describes a comprehensive prototype of large-scale fault adaptive
embedded software developed for the proposed Fermilab BTeV high energy physics
experiment. Lightweight self-optimizing agents embedded within Level 1 of the
prototype are responsible for proactive and reactive monitoring and mitigation
based on specified layers of competence. The agents are self-protecting,
detecting cascading failures using a distributed approach. Adaptive,
reconfigurable, and mobile objects for reliablility are designed to be
self-configuring to adapt automatically to dynamically changing environments.
These objects provide a self-healing layer with the ability to discover,
diagnose, and react to discontinuities in real-time processing. A generic
modeling environment was developed to facilitate design and implementation of
hardware resource specifications, application data flow, and failure mitigation
strategies. Level 1 of the planned BTeV trigger system alone will consist of
2500 DSPs, so the number of components and intractable fault scenarios involved
make it impossible to design an `expert system' that applies traditional
centralized mitigative strategies based on rules capturing every possible
system state. Instead, a distributed reactive approach is implemented using the
tools and methodologies developed by the Real-Time Embedded Systems group.Comment: 2nd Workshop on Engineering of Autonomic Systems (EASe), in the 12th
Annual IEEE International Conference and Workshop on the Engineering of
Computer Based Systems (ECBS), Washington, DC, April, 200
Transparency in Complex Computational Systems
Scientists depend on complex computational systems that are often ineliminably opaque, to the detriment of our ability to give scientific explanations and detect artifacts. Some philosophers have s..
Test exploration and validation using transaction level models
The complexity of the test infrastructure and test strategies in systems-on-chip approaches the complexity of the functional design space. This paper presents test design space exploration and validation of test strategies and schedules using transaction level models (TLMs). Since many aspects of testing involve the transfer of a significant amount of test stimuli and responses, the communication-centric view of TLMs suits this purpose exceptionally wel
A framework for effective management of condition based maintenance programs in the context of industrial development of E-Maintenance strategies
CBM (Condition Based Maintenance) solutions are increasingly present in industrial systems due to two
main circumstances: rapid evolution, without precedents, in the capture and analysis of data and
significant cost reduction of supporting technologies. CBM programs in industrial systems can become
extremely complex, especially when considering the effective introduction of new capabilities provided
by PHM (Prognostics and Health Management) and E-maintenance disciplines. In this scenario, any CBM
solution involves the management of numerous technical aspects, that the maintenance manager needs
to understand, in order to be implemented properly and effectively, according to the company’s strategy.
This paper provides a comprehensive representation of the key components of a generic CBM solution,
this is presented using a framework or supporting structure for an effective management of the CBM
programs. The concept “symptom of failure”, its corresponding analysis techniques (introduced by ISO
13379-1 and linked with RCM/FMEA analysis), and other international standard for CBM open-software
application development (for instance, ISO 13374 and OSA-CBM), are used in the paper for the
development of the framework. An original template has been developed, adopting the formal structure
of RCM analysis templates, to integrate the information of the PHM techniques used to capture the failure
mode behaviour and to manage maintenance. Finally, a case study describes the framework using the
referred template.Gobierno de Andalucía P11-TEP-7303 M
- …