15,702 research outputs found

    Parallel Architectures for Planetary Exploration Requirements (PAPER)

    Get PDF
    The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    Application of advanced technology to space automation

    Get PDF
    Automated operations in space provide the key to optimized mission design and data acquisition at minimum cost for the future. The results of this study strongly accentuate this statement and should provide further incentive for immediate development of specific automtion technology as defined herein. Essential automation technology requirements were identified for future programs. The study was undertaken to address the future role of automation in the space program, the potential benefits to be derived, and the technology efforts that should be directed toward obtaining these benefits

    Fault diagnostic instrumentation design for environmental control and life support systems

    Get PDF
    As a development phase moves toward flight hardware, the system availability becomes an important design aspect which requires high reliability and maintainability. As part of continous development efforts, a program to evaluate, design, and demonstrate advanced instrumentation fault diagnostics was successfully completed. Fault tolerance designs for reliability and other instrumenation capabilities to increase maintainability were evaluated and studied

    Real-time predictive maintenance for wind turbines using Big Data frameworks

    Full text link
    This work presents the evolution of a solution for predictive maintenance to a Big Data environment. The proposed adaptation aims for predicting failures on wind turbines using a data-driven solution deployed in the cloud and which is composed by three main modules. (i) A predictive model generator which generates predictive models for each monitored wind turbine by means of Random Forest algorithm. (ii) A monitoring agent that makes predictions every 10 minutes about failures in wind turbines during the next hour. Finally, (iii) a dashboard where given predictions can be visualized. To implement the solution Apache Spark, Apache Kafka, Apache Mesos and HDFS have been used. Therefore, we have improved the previous work in terms of data process speed, scalability and automation. In addition, we have provided fault-tolerant functionality with a centralized access point from where the status of all the wind turbines of a company localized all over the world can be monitored, reducing O&M costs

    Decentralized fault-tolerant control of inland navigation networks: a challenge

    Get PDF
    Inland waterways are large-scale networks used principally for navigation. Even if the transport planning is an important issue, the water resource management is a crucial point. Indeed, navigation is not possible when there is too little or too much water inside the waterways. Hence, the water resource management of waterways has to be particularly efficient in a context of climate change and increase of water demand. This management has to be done by considering different time and space scales and still requires the development of new methodologies and tools in the topics of the Control and Informatics communities. This work addresses the problem of waterways management in terms of modeling, control, diagnosis and fault-tolerant control by focusing in the inland waterways of the north of France. A review of proposed tools and the ongoing research topics are provided in this paper.Peer ReviewedPostprint (published version

    Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study

    Get PDF
    This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications
    • …
    corecore