3 research outputs found
FPGA-based fault injector for SEU-robustness analysis of ScOSA
The Scalable On-board Computer for Space Avionics (ScOSA) project aims to develop an on-board computer
which offers both reliability and high-performance through the use of a heterogeneous distributed system of
commercial-off-the-shelf and radiation-hardened processors. This system should operate without failures even
in the presence of single-event upsets (SEUs), which are common occurrences for electronic systems in space.
The ScOSA middleware includes several fault detection, isolation and recovery (FDIR) mechanisms for coping
with faults, but their effectiveness in the presence of radiation has not yet been proven, as testing such effects
on the ground is challenging. This paper presents our approach to investigate the effect of single-event upsets
on the ScOSA system and the effectiveness of its error handling mechanisms in their presence. A fault injector
has been instantiated in the FPGA co-processor of a commercial-off-the-shelf Xilinx system-on-chip from the
Zynq 7000 family using a Microblaze soft processor, which is used to simulate the effect of SEUs by flipping
bits in the main memory used by the kernel, middleware and applications.
A machine-learning-based image processing algorithm will be used as an example application and run using
the ScOSA middleware while the fault injector is active. The system will be executed multiple times, with
faults injected into different memory locations and at different times in each run. The system will be monitored
for FDIR events and unrecoverable failures. The operation of the middleware and the results of the sample
application will be compared to the results of a golden run, where no faults are injected, to assess the number
of unhandled errors at the middleware and application levels. The results are classified by severity, such as
incorrect algorithm results, handled FDIR events and unhandled system crashes. These results will then be
correlated with the fault location, such as kernel or application memory. By applying SEU simulation techniques
to an on-board software system, we aim to demonstrate the usefulness of such simulations as well as guiding
the further development of the ScOSA system to target further SEU mitigation efforts and improve the systems
robustness, as well as characterizing the systems robustness to SEUs occurring in different locations
Enabling Rapid Development of On-board Applications: Securing a Spacecraft Middleware by Separation and Isolation
Today’s space missions require increasingly powerful hardware to achieve their mission objectives, such as high-resolution
Earth observation or autonomous decision-making in deep space. At the same time, system availability and reliability require-
ments remain high due to the harsh environment in which the system operates. This leads to an engineering trade-off between the
use of reliable and high performance hardware. To overcome this trade-off, the German Aerospace Center (DLR) is developing
a special computer architecture that combines both reliable computing hardware with high-performance commercial-off-the-
shelf (COTS) hardware. This computer architecture is called Scalable On-Board Computing for Space Avionics (ScOSA) and
is currently being prepared for demonstration on a CubeSat, also known as the ScOSA Flight Experiment [1].
The ScOSA software consists of a middleware to execute distributed applications, perform critical on-board software
functionalities, and do fault detection and recovery tasks. The software is based on the Distributed Tasking Framework which is
a derivate of the open-source, data-flow oriented Tasking Framework [2], for this reason, developers organize their applications
as a set of tasks and channels. The middleware handles the task distribution among the nodes [3]. ScOSA will detect failing
compute nodes and reallocate tasks to maintain the availability of the entire system. The middleware can also change the set
of allocated tasks to support different mission phases. Thus, ScOSA allows software to be reloaded and executed after startup.
By this the software can be tested quickly and safely on the system. Combined with an upload strategy, ScOSA can be used
for in-situ testing of on-board applications.
Since ScOSA will also perform mission-critical tasks, such as an Attitude and Orbit Control System or a Command and Data
Handling System, the opening of the platform leads to the problem of mixed criticality [4]. This problem is already present in
the ScOSA Flight Experiment, since the demonstration will include typical satellite applications developed by different teams in
the DLR. Thus, not only the teams implement different quality standards for their software, but also the applications themselves
have different Technical Readiness Levels (TRLs).
The challenge of mixed criticality is often met by completely separating and isolating the different software components,
e.g. by using a hypervisor or a separation kernel [5], [6]. Due to the distributed nature of the ScOSA system and its execution
platform a separation using hypervisor technique is not easily achievable.
For this reason, we discuss in this work how we separate the critical services and communication components into their
own Linux process to guarantee that best-effort applications are not inflicting the critical components of the middleware. We
also consider and discuss in this work how to implement further mechanisms of the Linux kernel in order to strengthen
the separation, i.e. the cgroups and the kernel namespaces. However, a complete isolation between software components is
undesirable, due to the necessary interaction between them. Given that the applications themselves can be spread over several
nodes, the application tasks need to communicate and this can be only done if the critical software components relays messages
from other nodes to the separated application processes. For this reason the middleware provides a relay service which takes
care of the intra-node-inter-process-communication. Using a relaying mechanism simplifies development and does not require
a complete rewrite of the existing middleware network stack.
The proposed techniques were applied in a case study to integrate applications of unknown quality standards into the ScOSA
software system in an agile way. We discuss how the presented measures ensure that the resultant software is sufficiently tested
and meets the required quality level.
Finally, we discuss possible improvements to our existing separation and isolation solution for ScOSA and outline how these
techniques can be used in other platforms such as the RTEMS operating system
ScOSA on the Way to Orbit: Reconfigurable High-Performance Computing for Spacecraft
The German Aerospace Center (DLR) is developing ScOSA (Scalable On-board Computing for Space Avionics) as a distributed on-board computing architecture for future space missions. The ScOSA architecture consists of commercial off-the-shelf (COTS) and radiation-tolerant nodes interconnected by a SpaceWire network. The system software provides services to enable parallel computing and system reconfiguration. This allows ScOSA to adapt to node errors and failures that COTS hardware is susceptible to in the space environment. In the ongoing ScOSA Flight Experiment project, a ScOSA system consisting of eight Xilinx Zynq systems-on-chip with dual-core ARM-based processors and a LEON3 radiation-tolerant processor is being built for launch on DLR's next CubeSat in late 2024. In this flight experiment, not only all 18 cores but also the programmable logic will be used for high performance on-board data processing. This paper presents the current hardware and software architecture of ScOSA. The scalability of ScOSA is highlighted from both hardware and software perspectives. We present benchmark results of the ScOSA system and experiments of the ScOSA system software on ESA's OPS-SAT in orbit in combination with a machine learning application for image classification