3 research outputs found

    FPGA-based fault injector for SEU-robustness analysis of ScOSA

    Get PDF
    The Scalable On-board Computer for Space Avionics (ScOSA) project aims to develop an on-board computer which offers both reliability and high-performance through the use of a heterogeneous distributed system of commercial-off-the-shelf and radiation-hardened processors. This system should operate without failures even in the presence of single-event upsets (SEUs), which are common occurrences for electronic systems in space. The ScOSA middleware includes several fault detection, isolation and recovery (FDIR) mechanisms for coping with faults, but their effectiveness in the presence of radiation has not yet been proven, as testing such effects on the ground is challenging. This paper presents our approach to investigate the effect of single-event upsets on the ScOSA system and the effectiveness of its error handling mechanisms in their presence. A fault injector has been instantiated in the FPGA co-processor of a commercial-off-the-shelf Xilinx system-on-chip from the Zynq 7000 family using a Microblaze soft processor, which is used to simulate the effect of SEUs by flipping bits in the main memory used by the kernel, middleware and applications. A machine-learning-based image processing algorithm will be used as an example application and run using the ScOSA middleware while the fault injector is active. The system will be executed multiple times, with faults injected into different memory locations and at different times in each run. The system will be monitored for FDIR events and unrecoverable failures. The operation of the middleware and the results of the sample application will be compared to the results of a golden run, where no faults are injected, to assess the number of unhandled errors at the middleware and application levels. The results are classified by severity, such as incorrect algorithm results, handled FDIR events and unhandled system crashes. These results will then be correlated with the fault location, such as kernel or application memory. By applying SEU simulation techniques to an on-board software system, we aim to demonstrate the usefulness of such simulations as well as guiding the further development of the ScOSA system to target further SEU mitigation efforts and improve the systems robustness, as well as characterizing the systems robustness to SEUs occurring in different locations

    Enabling Rapid Development of On-board Applications: Securing a Spacecraft Middleware by Separation and Isolation

    Get PDF
    Today’s space missions require increasingly powerful hardware to achieve their mission objectives, such as high-resolution Earth observation or autonomous decision-making in deep space. At the same time, system availability and reliability require- ments remain high due to the harsh environment in which the system operates. This leads to an engineering trade-off between the use of reliable and high performance hardware. To overcome this trade-off, the German Aerospace Center (DLR) is developing a special computer architecture that combines both reliable computing hardware with high-performance commercial-off-the- shelf (COTS) hardware. This computer architecture is called Scalable On-Board Computing for Space Avionics (ScOSA) and is currently being prepared for demonstration on a CubeSat, also known as the ScOSA Flight Experiment [1]. The ScOSA software consists of a middleware to execute distributed applications, perform critical on-board software functionalities, and do fault detection and recovery tasks. The software is based on the Distributed Tasking Framework which is a derivate of the open-source, data-flow oriented Tasking Framework [2], for this reason, developers organize their applications as a set of tasks and channels. The middleware handles the task distribution among the nodes [3]. ScOSA will detect failing compute nodes and reallocate tasks to maintain the availability of the entire system. The middleware can also change the set of allocated tasks to support different mission phases. Thus, ScOSA allows software to be reloaded and executed after startup. By this the software can be tested quickly and safely on the system. Combined with an upload strategy, ScOSA can be used for in-situ testing of on-board applications. Since ScOSA will also perform mission-critical tasks, such as an Attitude and Orbit Control System or a Command and Data Handling System, the opening of the platform leads to the problem of mixed criticality [4]. This problem is already present in the ScOSA Flight Experiment, since the demonstration will include typical satellite applications developed by different teams in the DLR. Thus, not only the teams implement different quality standards for their software, but also the applications themselves have different Technical Readiness Levels (TRLs). The challenge of mixed criticality is often met by completely separating and isolating the different software components, e.g. by using a hypervisor or a separation kernel [5], [6]. Due to the distributed nature of the ScOSA system and its execution platform a separation using hypervisor technique is not easily achievable. For this reason, we discuss in this work how we separate the critical services and communication components into their own Linux process to guarantee that best-effort applications are not inflicting the critical components of the middleware. We also consider and discuss in this work how to implement further mechanisms of the Linux kernel in order to strengthen the separation, i.e. the cgroups and the kernel namespaces. However, a complete isolation between software components is undesirable, due to the necessary interaction between them. Given that the applications themselves can be spread over several nodes, the application tasks need to communicate and this can be only done if the critical software components relays messages from other nodes to the separated application processes. For this reason the middleware provides a relay service which takes care of the intra-node-inter-process-communication. Using a relaying mechanism simplifies development and does not require a complete rewrite of the existing middleware network stack. The proposed techniques were applied in a case study to integrate applications of unknown quality standards into the ScOSA software system in an agile way. We discuss how the presented measures ensure that the resultant software is sufficiently tested and meets the required quality level. Finally, we discuss possible improvements to our existing separation and isolation solution for ScOSA and outline how these techniques can be used in other platforms such as the RTEMS operating system

    ScOSA on the Way to Orbit: Reconfigurable High-Performance Computing for Spacecraft

    Get PDF
    The German Aerospace Center (DLR) is developing ScOSA (Scalable On-board Computing for Space Avionics) as a distributed on-board computing architecture for future space missions. The ScOSA architecture consists of commercial off-the-shelf (COTS) and radiation-tolerant nodes interconnected by a SpaceWire network. The system software provides services to enable parallel computing and system reconfiguration. This allows ScOSA to adapt to node errors and failures that COTS hardware is susceptible to in the space environment. In the ongoing ScOSA Flight Experiment project, a ScOSA system consisting of eight Xilinx Zynq systems-on-chip with dual-core ARM-based processors and a LEON3 radiation-tolerant processor is being built for launch on DLR's next CubeSat in late 2024. In this flight experiment, not only all 18 cores but also the programmable logic will be used for high performance on-board data processing. This paper presents the current hardware and software architecture of ScOSA. The scalability of ScOSA is highlighted from both hardware and software perspectives. We present benchmark results of the ScOSA system and experiments of the ScOSA system software on ESA's OPS-SAT in orbit in combination with a machine learning application for image classification
    corecore