3,546 research outputs found

    Software dependability modeling using an industry-standard architecture description language

    Full text link
    Performing dependability evaluation along with other analyses at architectural level allows both making architectural tradeoffs and predicting the effects of architectural decisions on the dependability of an application. This paper gives guidelines for building architectural dependability models for software systems using the AADL (Architecture Analysis and Design Language). It presents reusable modeling patterns for fault-tolerant applications and shows how the presented patterns can be used in the context of a subsystem of a real-life application

    Continuous maintenance and the future – Foundations and technological challenges

    Get PDF
    High value and long life products require continuous maintenance throughout their life cycle to achieve required performance with optimum through-life cost. This paper presents foundations and technologies required to offer the maintenance service. Component and system level degradation science, assessment and modelling along with life cycle ‘big data’ analytics are the two most important knowledge and skill base required for the continuous maintenance. Advanced computing and visualisation technologies will improve efficiency of the maintenance and reduce through-life cost of the product. Future of continuous maintenance within the Industry 4.0 context also identifies the role of IoT, standards and cyber security

    An Adaptive Modular Redundancy Technique to Self-regulate Availability, Area, and Energy Consumption in Mission-critical Applications

    Get PDF
    As reconfigurable devices\u27 capacities and the complexity of applications that use them increase, the need for self-reliance of deployed systems becomes increasingly prominent. A Sustainable Modular Adaptive Redundancy Technique (SMART) composed of a dual-layered organic system is proposed, analyzed, implemented, and experimentally evaluated. SMART relies upon a variety of self-regulating properties to control availability, energy consumption, and area used, in dynamically-changing environments that require high degree of adaptation. The hardware layer is implemented on a Xilinx Virtex-4 Field Programmable Gate Array (FPGA) to provide self-repair using a novel approach called a Reconfigurable Adaptive Redundancy System (RARS). The software layer supervises the organic activities within the FPGA and extends the self-healing capabilities through application-independent, intrinsic, evolutionary repair techniques to leverage the benefits of dynamic Partial Reconfiguration (PR). A SMART prototype is evaluated using a Sobel edge detection application. This prototype is shown to provide sustainability for stressful occurrences of transient and permanent fault injection procedures while still reducing energy consumption and area requirements. An Organic Genetic Algorithm (OGA) technique is shown capable of consistently repairing hard faults while maintaining correct edge detector outputs, by exploiting spatial redundancy in the reconfigurable hardware. A Monte Carlo driven Continuous Markov Time Chains (CTMC) simulation is conducted to compare SMART\u27s availability to industry-standard Triple Modular Technique (TMR) techniques. Based on nine use cases, parameterized with realistic fault and repair rates acquired from publically available sources, the results indicate that availability is significantly enhanced by the adoption of fast repair techniques targeting aging-related hard-faults. Under harsh environments, SMART is shown to improve system availability from 36.02% with lengthy repair techniques to 98.84% with fast ones. This value increases to five nines (99.9998%) under relatively more favorable conditions. Lastly, SMART is compared to twenty eight standard TMR benchmarks that are generated by the widely-accepted BL-TMR tools. Results show that in seven out of nine use cases, SMART is the recommended technique, with power savings ranging from 22% to 29%, and area savings ranging from 17% to 24%, while still maintaining the same level of availability

    Recovery Model for Survivable System through Resource Reconfiguration

    Get PDF
    A survivable system is able to fulfil its mission in a timely manner, in the presence of attacks, failures, or accidents. It has been realized that it is not always possible to anticipate every type of attack or failure or accident in a system, and to predict and protect against those threats. Consequently, recovering back from any damage caused by threats becomes an important attention to be taken into account. This research proposed another recovery model to enhance system survivability. The model focuses on how to preserve the system and resume its critical service while incident occurs by reconfiguring the damaged critical service resources based on available resources without affecting the stability and functioning of the system. There are three critical requisite conditions in this recovery model: the number of pre-empted non-critical service resources, the response time of resource allocation, and the cost of reconfiguration, which are used in some scenarios to find and re-allocate the available resource for the reconfiguration. A brief specifications using Z language are also explored as a preliminary proof before the implementation .. To validate the viability of the approach, two instance cases studies of real-time system, delivery units of post office and computer system of a company, are provided in ensuring the durative running of critical service. The adoption of fault-tolerance and survivability using redundancy re-allocation in this recovery model is discussed from a new perspective. Compared to the closest work done by other researchers, it is shown that the model can solve not only single fault and can reconfigure the damage resource with minimum disruption to other services

    Fault diagnostic instrumentation design for environmental control and life support systems

    Get PDF
    As a development phase moves toward flight hardware, the system availability becomes an important design aspect which requires high reliability and maintainability. As part of continous development efforts, a program to evaluate, design, and demonstrate advanced instrumentation fault diagnostics was successfully completed. Fault tolerance designs for reliability and other instrumenation capabilities to increase maintainability were evaluated and studied

    A critical analysis of research potential, challenges and future directives in industrial wireless sensor networks

    Get PDF
    In recent years, Industrial Wireless Sensor Networks (IWSNs) have emerged as an important research theme with applications spanning a wide range of industries including automation, monitoring, process control, feedback systems and automotive. Wide scope of IWSNs applications ranging from small production units, large oil and gas industries to nuclear fission control, enables a fast-paced research in this field. Though IWSNs offer advantages of low cost, flexibility, scalability, self-healing, easy deployment and reformation, yet they pose certain limitations on available potential and introduce challenges on multiple fronts due to their susceptibility to highly complex and uncertain industrial environments. In this paper a detailed discussion on design objectives, challenges and solutions, for IWSNs, are presented. A careful evaluation of industrial systems, deadlines and possible hazards in industrial atmosphere are discussed. The paper also presents a thorough review of the existing standards and industrial protocols and gives a critical evaluation of potential of these standards and protocols along with a detailed discussion on available hardware platforms, specific industrial energy harvesting techniques and their capabilities. The paper lists main service providers for IWSNs solutions and gives insight of future trends and research gaps in the field of IWSNs

    Fault-tolerant fpga for mission-critical applications.

    Get PDF
    One of the devices that play a great role in electronic circuits design, specifically safety-critical design applications, is Field programmable Gate Arrays (FPGAs). This is because of its high performance, re-configurability and low development cost. FPGAs are used in many applications such as data processing, networks, automotive, space and industrial applications. Negative impacts on the reliability of such applications result from moving to smaller feature sizes in the latest FPGA architectures. This increases the need for fault-tolerant techniques to improve reliability and extend system lifetime of FPGA-based applications. In this thesis, two fault-tolerant techniques for FPGA-based applications are proposed with a built-in fault detection region. A low cost fault detection scheme is proposed for detecting faults using the fault detection region used in both schemes. The fault detection scheme primarily detects open faults in the programmable interconnect resources in the FPGAs. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can be detected. For fault recovery, each scheme has its own fault recovery approach. The first approach uses a spare module and a 2-to-1 multiplexer to recover from any fault detected. On the other hand, the second approach recovers from any fault detected using the property of Partial Reconfiguration (PR) in the FPGAs. It relies on identifying a Partially Reconfigurable block (P_b) in the FPGA that is used in the recovery process after the first faulty module is identified in the system. This technique uses only one location to recover from faults in any of the FPGA’s modules and the FPGA interconnects. Simulation results show that both techniques can detect and recover from open faults. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can also be detected. Finally, both techniques require low area overhead

    Engineering Resilient Space Systems

    Get PDF
    Several distinct trends will influence space exploration missions in the next decade. Destinations are becoming more remote and mysterious, science questions more sophisticated, and, as mission experience accumulates, the most accessible targets are visited, advancing the knowledge frontier to more difficult, harsh, and inaccessible environments. This leads to new challenges including: hazardous conditions that limit mission lifetime, such as high radiation levels surrounding interesting destinations like Europa or toxic atmospheres of planetary bodies like Venus; unconstrained environments with navigation hazards, such as free-floating active small bodies; multielement missions required to answer more sophisticated questions, such as Mars Sample Return (MSR); and long-range missions, such as Kuiper belt exploration, that must survive equipment failures over the span of decades. These missions will need to be successful without a priori knowledge of the most efficient data collection techniques for optimum science return. Science objectives will have to be revised ‘on the fly’, with new data collection and navigation decisions on short timescales. Yet, even as science objectives are becoming more ambitious, several critical resources remain unchanged. Since physics imposes insurmountable light-time delays, anticipated improvements to the Deep Space Network (DSN) will only marginally improve the bandwidth and communications cadence to remote spacecraft. Fiscal resources are increasingly limited, resulting in fewer flagship missions, smaller spacecraft, and less subsystem redundancy. As missions visit more distant and formidable locations, the job of the operations team becomes more challenging, seemingly inconsistent with the trend of shrinking mission budgets for operations support. How can we continue to explore challenging new locations without increasing risk or system complexity? These challenges are present, to some degree, for the entire Decadal Survey mission portfolio, as documented in Vision and Voyages for Planetary Science in the Decade 2013–2022 (National Research Council, 2011), but are especially acute for the following mission examples, identified in our recently completed KISS Engineering Resilient Space Systems (ERSS) study: 1. A Venus lander, designed to sample the atmosphere and surface of Venus, would have to perform science operations as components and subsystems degrade and fail; 2. A Trojan asteroid tour spacecraft would spend significant time cruising to its ultimate destination (essentially hibernating to save on operations costs), then upon arrival, would have to act as its own surveyor, finding new objects and targets of opportunity as it approaches each asteroid, requiring response on short notice; and 3. A MSR campaign would not only be required to perform fast reconnaissance over long distances on the surface of Mars, interact with an unknown physical surface, and handle degradations and faults, but would also contain multiple components (launch vehicle, cruise stage, entry and landing vehicle, surface rover, ascent vehicle, orbiting cache, and Earth return vehicle) that dramatically increase the need for resilience to failure across the complex system. The concept of resilience and its relevance and application in various domains was a focus during the study, with several definitions of resilience proposed and discussed. While there was substantial variation in the specifics, there was a common conceptual core that emerged—adaptation in the presence of changing circumstances. These changes were couched in various ways—anomalies, disruptions, discoveries—but they all ultimately had to do with changes in underlying assumptions. Invalid assumptions, whether due to unexpected changes in the environment, or an inadequate understanding of interactions within the system, may cause unexpected or unintended system behavior. A system is resilient if it continues to perform the intended functions in the presence of invalid assumptions. Our study focused on areas of resilience that we felt needed additional exploration and integration, namely system and software architectures and capabilities, and autonomy technologies. (While also an important consideration, resilience in hardware is being addressed in multiple other venues, including 2 other KISS studies.) The study consisted of two workshops, separated by a seven-month focused study period. The first workshop (Workshop #1) explored the ‘problem space’ as an organizing theme, and the second workshop (Workshop #2) explored the ‘solution space’. In each workshop, focused discussions and exercises were interspersed with presentations from participants and invited speakers. The study period between the two workshops was organized as part of the synthesis activity during the first workshop. The study participants, after spending the initial days of the first workshop discussing the nature of resilience and its impact on future science missions, decided to split into three focus groups, each with a particular thrust, to explore specific ideas further and develop material needed for the second workshop. The three focus groups and areas of exploration were: 1. Reference missions: address/refine the resilience needs by exploring a set of reference missions 2. Capability survey: collect, document, and assess current efforts to develop capabilities and technology that could be used to address the documented needs, both inside and outside NASA 3. Architecture: analyze the impact of architecture on system resilience, and provide principles and guidance for architecting greater resilience in our future systems The key product of the second workshop was a set of capability roadmaps pertaining to the three reference missions selected for their representative coverage of the types of space missions envisioned for the future. From these three roadmaps, we have extracted several common capability patterns that would be appropriate targets for near-term technical development: one focused on graceful degradation of system functionality, a second focused on data understanding for science and engineering applications, and a third focused on hazard avoidance and environmental uncertainty. Continuing work is extending these roadmaps to identify candidate enablers of the capabilities from the following three categories: architecture solutions, technology solutions, and process solutions. The KISS study allowed a collection of diverse and engaged engineers, researchers, and scientists to think deeply about the theory, approaches, and technical issues involved in developing and applying resilience capabilities. The conclusions summarize the varied and disparate discussions that occurred during the study, and include new insights about the nature of the challenge and potential solutions: 1. There is a clear and definitive need for more resilient space systems. During our study period, the key scientists/engineers we engaged to understand potential future missions confirmed the scientific and risk reduction value of greater resilience in the systems used to perform these missions. 2. Resilience can be quantified in measurable terms—project cost, mission risk, and quality of science return. In order to consider resilience properly in the set of engineering trades performed during the design, integration, and operation of space systems, the benefits and costs of resilience need to be quantified. We believe, based on the work done during the study, that appropriate metrics to measure resilience must relate to risk, cost, and science quality/opportunity. Additional work is required to explicitly tie design decisions to these first-order concerns. 3. There are many existing basic technologies that can be applied to engineering resilient space systems. Through the discussions during the study, we found many varied approaches and research that address the various facets of resilience, some within NASA, and many more beyond. Examples from civil architecture, Department of Defense (DoD) / Defense Advanced Research Projects Agency (DARPA) initiatives, ‘smart’ power grid control, cyber-physical systems, software architecture, and application of formal verification methods for software were identified and discussed. The variety and scope of related efforts is encouraging and presents many opportunities for collaboration and development, and we expect many collaborative proposals and joint research as a result of the study. 4. Use of principled architectural approaches is key to managing complexity and integrating disparate technologies. The main challenge inherent in considering highly resilient space systems is that the increase in capability can result in an increase in complexity with all of the 3 risks and costs associated with more complex systems. What is needed is a better way of conceiving space systems that enables incorporation of capabilities without increasing complexity. We believe principled architecting approaches provide the needed means to convey a unified understanding of the system to primary stakeholders, thereby controlling complexity in the conception and development of resilient systems, and enabling the integration of disparate approaches and technologies. A representative architectural example is included in Appendix F. 5. Developing trusted resilience capabilities will require a diverse yet strategically directed research program. Despite the interest in, and benefits of, deploying resilience space systems, to date, there has been a notable lack of meaningful demonstrated progress in systems capable of working in hazardous uncertain situations. The roadmaps completed during the study, and documented in this report, provide the basis for a real funded plan that considers the required fundamental work and evolution of needed capabilities. Exploring space is a challenging and difficult endeavor. Future space missions will require more resilience in order to perform the desired science in new environments under constraints of development and operations cost, acceptable risk, and communications delays. Development of space systems with resilient capabilities has the potential to expand the limits of possibility, revolutionizing space science by enabling as yet unforeseen missions and breakthrough science observations. Our KISS study provided an essential venue for the consideration of these challenges and goals. Additional work and future steps are needed to realize the potential of resilient systems—this study provided the necessary catalyst to begin this process

    Zero-maintenance of electronic systems: Perspectives, challenges, and opportunities

    Get PDF
    Self-engineering systems that are capable of repairing themselves in-situ without the need for human decision (or intervention) could be used to achieve zero-maintenance. This philosophy is synonymous to the way in which the human body heals and repairs itself up to a point. This article synthesises issues related to an emerging area of self-healing technologies that links software and hardware mitigations strategies. Efforts are concentrated on built-in detection, masking and active mitigation that comprises self-recovery or self-repair capability, and has a focus on system resilience and recovering from fault events. Design techniques are critically reviewed to clarify the role of fault coverage, resource allocation and fault awareness, set in the context of existing and emerging printable/nanoscale manufacturing processes. The qualitative analysis presents new opportunities to form a view on the research required for a successful integration of zero-maintenance. Finally, the potential cost benefits and future trends are enumerated
    • …
    corecore