Search CORE

55,115 research outputs found

Process of designing robust, dependable, safe and secure software for medical devices: Point of care testing device as a case study

Author: Balachandran W
Craw P
Gkatzidou V
Hudson C
Mackay R
Tulasidas S
Publication venue: 'Scientific Research Publishing, Inc.'
Publication date: 01/01/2013
Field of study

This article has been made available through the Brunel Open Access Publishing Fund.Copyright © 2013 Sivanesan Tulasidas et al. This paper presents a holistic methodology for the design of medical device software, which encompasses of a new way of eliciting requirements, system design process, security design guideline, cloud architecture design, combinatorial testing process and agile project management. The paper uses point of care diagnostics as a case study where the software and hardware must be robust, reliable to provide accurate diagnosis of diseases. As software and software intensive systems are becoming increasingly complex, the impact of failures can lead to significant property damage, or damage to the environment. Within the medical diagnostic device software domain such failures can result in misdiagnosis leading to clinical complications and in some cases death. Software faults can arise due to the interaction among the software, the hardware, third party software and the operating environment. Unanticipated environmental changes and latent coding errors lead to operation faults despite of the fact that usually a significant effort has been expended in the design, verification and validation of the software system. It is becoming increasingly more apparent that one needs to adopt different approaches, which will guarantee that a complex software system meets all safety, security, and reliability requirements, in addition to complying with standards such as IEC 62304. There are many initiatives taken to develop safety and security critical systems, at different development phases and in different contexts, ranging from infrastructure design to device design. Different approaches are implemented to design error free software for safety critical systems. By adopting the strategies and processes presented in this paper one can overcome the challenges in developing error free software for medical devices (or safety critical systems).Brunel Open Access Publishing Fund

CiteSeerX

Crossref

Brunel University Research Archive

Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study

Author: Bezerra Eduardo Augusto
Gough Michael Paul
Vargas Fabian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2001
Field of study

This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications

Sussex Research Online

Recommended from our members

Fault tolerance in super-scalar and VLIW processors

Author: Blough Douglas M.
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

In this paper, we present a method for utilizing the spare capacity in super-scalar and very long instruction word (VLIW) processors to tolerate functional unit failures. Unlike previous work that was primarily interested in detection of transient faults, we are concerned with more permanent and/or intermittent faults which necessitate processor reconfiguration. Our method utilizes the VLIW compiler or the superscalar scheduler to insert redundant operations whenever idle functional units exist. The results of these redundant operations are used to detect and diagnose functional unit failures. For super-scalar processors, the scheduler can then utilize this information to ensure that operations are performed only on non-faulty units. In VLIW processors, this is equivalent to recompiling the code to run on the remaining non-faulty functional units. Since in certain applications, recompilation may not be possible, we consider two alternative reconfiguration strategies for VLIW processors. These strategies sacrifice storage space and execution time, respectively, in order to reconfigure without recompiling. We present Markov models that describe the behavior of processors using these different approaches and we evaluate their reliabilities. The results show that, while super-scalar and VLIW with recompilation provide the highest reliability, all proposed strategies significantly increase reliability over that of an unprotected processor

eScholarship - University of California

A Pattern Language for High-Performance Computing Resilience

Author: Chung Jinsuk
Mohror Kathryn
Saridakis Titos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2017
Field of study

High-performance computing systems (HPC) provide powerful capabilities for modeling, simulation, and data analytics for a broad class of computational problems. They enable extreme performance of the order of quadrillion floating-point arithmetic calculations per second by aggregating the power of millions of compute, memory, networking and storage components. With the rapidly growing scale and complexity of HPC systems for achieving even greater performance, ensuring their reliable operation in the face of system degradations and failures is a critical challenge. System fault events often lead the scientific applications to produce incorrect results, or may even cause their untimely termination. The sheer number of components in modern extreme-scale HPC systems and the complex interactions and dependencies among the hardware and software components, the applications, and the physical environment makes the design of practical solutions that support fault resilience a complex undertaking. To manage this complexity, we developed a methodology for designing HPC resilience solutions using design patterns. We codified the well-known techniques for handling faults, errors and failures that have been devised, applied and improved upon over the past three decades in the form of design patterns. In this paper, we present a pattern language to enable a structured approach to the development of HPC resilience solutions. The pattern language reveals the relations among the resilience patterns and provides the means to explore alternative techniques for handling a specific fault model that may have different efficiency and complexity characteristics. Using the pattern language enables the design and implementation of comprehensive resilience solutions as a set of interconnected resilience patterns that can be instantiated across layers of the system stack.Comment: Proceedings of the 22nd European Conference on Pattern Languages of Program

arXiv.org e-Print Archive

Crossref

A fault-tolerant intelligent robotic control system

Author: Marzwell Neville I.
Tso Kam Sing
Publication venue
Publication date
Field of study

This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines

NASA Technical Reports Server

Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems

Author: Haney Michael
Jung Mina
Messie Derek
Nordstrom Steven
Oh Jae C.
Shetty Shweta
Publication venue
Publication date: 01/01/2005
Field of study

This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight self-optimizing agents embedded within Level 1 of the prototype are responsible for proactive and reactive monitoring and mitigation based on specified layers of competence. The agents are self-protecting, detecting cascading failures using a distributed approach. Adaptive, reconfigurable, and mobile objects for reliablility are designed to be self-configuring to adapt automatically to dynamically changing environments. These objects provide a self-healing layer with the ability to discover, diagnose, and react to discontinuities in real-time processing. A generic modeling environment was developed to facilitate design and implementation of hardware resource specifications, application data flow, and failure mitigation strategies. Level 1 of the planned BTeV trigger system alone will consist of 2500 DSPs, so the number of components and intractable fault scenarios involved make it impossible to design an `expert system' that applies traditional centralized mitigative strategies based on rules capturing every possible system state. Instead, a distributed reactive approach is implemented using the tools and methodologies developed by the Real-Time Embedded Systems group.Comment: 2nd Workshop on Engineering of Autonomic Systems (EASe), in the 12th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems (ECBS), Washington, DC, April, 200

arXiv.org e-Print Archive

CiteSeerX

Syracuse University Research Facility and Collaborative Environment

Transparency in Complex Computational Systems

Author: Creel Kathleen A.
Publication venue
Publication date: 28/11/2019
Field of study

Scientists depend on complex computational systems that are often ineliminably opaque, to the detriment of our ability to give scientific explanations and detect artifacts. Some philosophers have s..

PhilPapers

PhilSci Archive

Test exploration and validation using transaction level models

Author: Di Carlo Stefano
Imhof M.E
Khaligh R.S
Kochte M.A
Prinetto Paolo Ernesto
Radetzki M.
Wunderlich H.-J
Zollen C.G
Publication venue: IEEE Computer Society
Publication date: 01/01/2009
Field of study

The complexity of the test infrastructure and test strategies in systems-on-chip approaches the complexity of the functional design space. This paper presents test design space exploration and validation of test strategies and schedules using transaction level models (TLMs). Since many aspects of testing involve the transfer of a significant amount of test stimuli and responses, the communication-centric view of TLMs suits this purpose exceptionally wel

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A framework for effective management of condition based maintenance programs in the context of industrial development of E-Maintenance strategies

Author: Crespo Márquez Adolfo
Guillén López Antonio Jesús
Gómez Fernández Juan Francisco
Sanz María Dolores
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

CBM (Condition Based Maintenance) solutions are increasingly present in industrial systems due to two main circumstances: rapid evolution, without precedents, in the capture and analysis of data and significant cost reduction of supporting technologies. CBM programs in industrial systems can become extremely complex, especially when considering the effective introduction of new capabilities provided by PHM (Prognostics and Health Management) and E-maintenance disciplines. In this scenario, any CBM solution involves the management of numerous technical aspects, that the maintenance manager needs to understand, in order to be implemented properly and effectively, according to the company’s strategy. This paper provides a comprehensive representation of the key components of a generic CBM solution, this is presented using a framework or supporting structure for an effective management of the CBM programs. The concept “symptom of failure”, its corresponding analysis techniques (introduced by ISO 13379-1 and linked with RCM/FMEA analysis), and other international standard for CBM open-software application development (for instance, ISO 13374 and OSA-CBM), are used in the paper for the development of the framework. An original template has been developed, adopting the formal structure of RCM analysis templates, to integrate the information of the PHM techniques used to capture the failure mode behaviour and to manage maintenance. Finally, a case study describes the framework using the referred template.Gobierno de Andalucía P11-TEP-7303 M

idUS. Depósito de Investigación Universidad de Sevilla