60 research outputs found
OSS architecture for mixed-criticality systems – a dual view from a software and system engineering perspective
Computer-based automation in industrial appliances led to a growing number of
logically dependent, but physically separated embedded control units per
appliance. Many of those components are safety-critical systems, and require
adherence to safety standards, which is inconsonant with the relentless demand
for features in those appliances. Features lead to a growing amount of control
units per appliance, and to a increasing complexity of the overall software
stack, being unfavourable for safety certifications. Modern CPUs provide means
to revise traditional separation of concerns design primitives: the consolidation
of systems, which yields new engineering challenges that concern the entire
software and system stack.
Multi-core CPUs favour economic consolidation of formerly separated
systems with one efficient single hardware unit. Nonetheless, the system
architecture must provide means to guarantee the freedom from interference
between domains of different criticality. System consolidation demands for
architectural and engineering strategies to fulfil requirements (e.g., real-time
or certifiability criteria) in safety-critical environments.
In parallel, there is an ongoing trend to substitute ordinary proprietary base
platform software components by mature OSS variants for economic and
engineering reasons. There are fundamental differences of processual properties
in development processes of OSS and proprietary software. OSS in
safety-critical systems requires development process assessment techniques to
build an evidence-based fundament for certification efforts that is based upon
empirical software engineering methods.
In this thesis, I will approach from both sides: the software and system
engineering perspective. In the first part of this thesis, I focus on the
assessment of OSS components: I develop software engineering techniques
that allow to quantify characteristics of distributed OSS development
processes. I show that ex-post analyses of software development processes can
be used to serve as a foundation for certification efforts, as it is required
for safety-critical systems.
In the second part of this thesis, I present a system architecture based on
OSS components that allows for consolidation of mixed-criticality systems
on a single platform. Therefore, I exploit virtualisation extensions of modern
CPUs to strictly isolate domains of different criticality. The proposed
architecture shall eradicate any remaining hypervisor activity in order to
preserve real-time capabilities of the hardware by design, while
guaranteeing strict isolation across domains.Computergestützte Automatisierung industrieller Systeme führt zu einer
wachsenden Anzahl an logisch abhängigen, aber physisch voneinander getrennten
Steuergeräten pro System. Viele der Einzelgeräte sind sicherheitskritische
Systeme, welche die Einhaltung von Sicherheitsstandards erfordern, was durch
die unermüdliche Nachfrage an Funktionalitäten erschwert wird. Diese führt zu
einer wachsenden Gesamtzahl an Steuergeräten, einhergehend mit wachsender
Komplexität des gesamten Softwarekorpus, wodurch Zertifizierungsvorhaben
erschwert werden. Moderne Prozessoren stellen Mittel zur Verfügung, welche es
ermöglichen, das traditionelle >Trennung von Belangen< Designprinzip zu
erneuern: die Systemkonsolidierung. Sie stellt neue ingenieurstechnische
Herausforderungen, die den gesamten Software und Systemstapel betreffen.
Mehrkernprozessoren begünstigen die ökonomische und effiziente Konsolidierung
vormals getrennter Systemen zu einer effizienten Hardwareeinheit. Geeignete
Systemarchitekturen müssen jedoch die Rückwirkungsfreiheit zwischen Domänen
unterschiedlicher Kritikalität sicherstellen. Die Konsolidierung erfordert
architektonische, als auch ingenieurstechnische Strategien um die Anforderungen
(etwa Echtzeit- oder Zertifizierbarkeitskriterien) in sicherheitskritischen
Umgebungen erfüllen zu können.
Zunehmend werden herkömmliche proprietär entwickelte Basisplattformkomponenten
aus ökonomischen und technischen Gründen vermehrt durch ausgereifte OSS
Alternativen ersetzt. Jedoch hindern fundamentale Unterschiede bei prozessualen
Eigenschaften des Entwicklungsprozesses bei OSS den Einsatz in
sicherheitskritischen Systemen. Dieser erfordert Techniken, welche es erlauben
die Entwicklungsprozesse zu bewerten um ein evidenzbasiertes Fundament für
Zertifizierungsvorhaben basierend auf empirischen Methoden des Software
Engineerings zur Verfügung zu stellen.
In dieser Arbeit nähere ich mich von beiden Seiten: der Softwaretechnik, und
der Systemarchitektur. Im ersten Teil befasse ich mich mit der Beurteilung von
OSS Komponenten: Ich entwickle Softwareanalysetechniken, welche es
ermöglichen, prozessuale Charakteristika von verteilten OSS
Entwicklungsvorhaben zu quantifizieren. Ich zeige, dass rückschauende Analysen
des Entwicklungsprozess als Grundlage für Softwarezertifizierungsvorhaben
genutzt werden können.
Im zweiten Teil dieser Arbeit widme ich mich der Systemarchitektur. Ich stelle
eine OSS-basierte Systemarchitektur vor, welche die Konsolidierung von
Systemen gemischter Kritikalität auf einer alleinstehenden Plattform
ermöglicht. Dazu nutze ich Virtualisierungserweiterungen moderner Prozessoren
aus, um die Hardware in strikt voneinander isolierten Rechendomänen unterschiedlicher
Kritikalität unterteilen zu können. Die vorgeschlagene Architektur soll jegliche
Betriebsstörungen des Hypervisors beseitigen, um die Echtzeitfähigkeiten der
Hardware bauartbedingt aufrecht zu erhalten, während strikte Isolierung
zwischen Domänen stets sicher gestellt ist
Response Time Analysis for Sporadic Server based Budget Scheduling in Real Time Virtualization Environments
Virtualization techniques for embedded real-time systems typically employ TDMA scheduling to achieve temporal isolation among different virtualized applications. Recent work already introduced sporadic server based solutions relying on budgets instead of a fixed TDMA schedule. While providing better average-case response times for IRQs and tasks, a formal response time analysis for the worst-case is still missing. In order to confirm the advantage of a sporadic server based budget scheduling, this paper provides a worst-case response time analysis. To improve the sporadic server based budget scheduling even more, we provide a background scheduling implementation which will also be covered by the formal analysis. We show correctness of the analysis approach and compare it against TDMA based systems. In addition to that, we provide response time measurements from a working hypervisor implementation on an ARM based development board
A TrustZone-assisted hypervisor supporting dynamic partial reconfiguration
Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresTraditionally, embedded systems were dedicated single-purpose systems characterised
by hardware resource constraints and real-time requirements. However,
with the growing computing abilities and resources on general purpose platforms,
systems that were formerly divided to provide different functions are now merging
into one System on Chip. One of the solutions that allows the coexistence
of heterogeneous environments on the same hardware platform is virtualization
technology, usually in the form of an hypervisor that manage different instances
of OSes and arbitrate their execution and resource usage, according to the chosen
policy.
ARM TrustZone has been one of the technologies used to implement a virtualization
solution with low overhead and low footprint. µRTZVisor a TrustZoneassisted
hypervisor with a microkernel-like architecture - is a bare-metal embedded
hypervisor that relies on TrustZone hardware to provide the foundation to implement
strong spatial and temporal isolation between multiple guest OSes.
The use of Partial Reconfiguration allows the designer to define partial reconfigurable
regions in the FPGA and reconfigure them during runtime. This allows
the system to have its functionalities changed during runtime using Dynamic Partial
Reconfiguration (DPR), without needing to reconfigure all the FPGA. This
is a major advantage, as it decreases the configuration overhead since partial bitstreams
are smaller than full bitstreams and the reconfiguration time is shorter.
Another advantage is reducing the need for larger logic areas and consequently
reducing their power consumption.
Therefore, a hypervisor that supports DPR brings benefits to the system. Aside
from better FPGA resources usage, another improvement that it brings, is when
critical hardware modules misbehave and the hardware module can be replaced.
It also enables the controlling and changing of hardware accelerators dynamically,
which can be used to meet the guest OSes requests for hardware resources as the
need appears. The propose of this thesis is extending the µRTZVisor to have a
DPR mechanism.Tradicionalmente, os sistemas embebidos eram sistemas dedicados a uma única
tarefa e apenas limitados pelos seus requisitos de tempo real e de hardware. Contudo,
como as plataformas de uso geral têm cada vez mais recursos e capacidade
de processamento, muitos dos sistemas que executavam separadamente, passaram
a apenas um sistema em plataforma recorrendo à tecnologia de virtualização, normalmente
como um hipervisor que é capaz de gerir múltiplos sistemas operativos
arbitrando a sua execução e acesso aos recursos da plataforma de acordo com uma
politica predefinida.
A tecnologia TrustZone da ARM tem sido uma das soluções implementadas
sem ter grande impacto na performance dos sistemas operativos. µRTZVisor é um
dos hipervisores baseados na TrustZone para implementar um isolamento espacial
e temporal entre múltiplos sistemas operativos, sendo que defere de outras uma
vez que é de arquitectura microkernel.
O uso de Reconfiguração Parcial Dinâmica (RPD) permite ao designer definir
várias regiões reconfiguráveis no FPGA que podem ser dinamicamente reconfiguradas
durante o período de execução. Esta é uma grande vantagem, porque reduz
os tempos de reconfiguração de módulos reconfiguráveis uma vez que os seus bitstreams
são mais pequenos que bitstreams para a plataforma toda. A tecnologia
também permite que nos FPGAs não sejam necessárias áreas lógicas tão grandes,
o que também reduz o consumo de energia da plataforma.
Um hipervisor que suporte RPD traz grandes benefícios para o sistema, nomeadamente
melhor uso dos recursos de FPGA, implementação de aceleradores em
hardware dinamicamente reconfiguráveis, e tratamento de falhas no hardware. Se
houverem módulos que estejam a demonstrar comportamentos inesperados estes
podem ser reconfigurados. O uso de aceleradores reconfiguráveis permite que o
hardware seja adaptável conforme a necessidade destes pelos diferentes sistemas
operativos. A proposta desta dissertação é então estender o µRTZVisor para ter
a capacidade de usar módulos reconfiguráveis por RPD
Development and certification of mixed-criticality embedded systems based on probabilistic timing analysis
An increasing variety of emerging systems relentlessly replaces or augments the functionality of mechanical subsystems with embedded electronics. For quantity, complexity, and use, the safety of such subsystems is an increasingly important matter. Accordingly, those systems are subject to safety certification to demonstrate system's safety by rigorous development processes and hardware/software constraints. The massive augment in embedded processors' complexity renders the arduous certification task significantly harder to achieve. The focus of this thesis is to address the certification challenges in multicore architectures: despite their potential to integrate several applications on a single platform, their inherent complexity imperils their timing predictability and certification. Recently, the Measurement-Based Probabilistic Timing Analysis (MBPTA) technique emerged as an alternative to deal with hardware/software complexity. The innovation that MBPTA brings about is, however, a major step from current certification procedures and standards. The particular contributions of this Thesis include: (i) the definition of certification arguments for mixed-criticality integration upon multicore processors. In particular we propose a set of safety mechanisms and procedures as required to comply with functional safety standards. For timing predictability, (ii) we present a quantitative approach to assess the likelihood of execution-time exceedance events with respect to the risk reduction requirements on safety standards. To this end, we build upon the MBPTA approach and we present the design of a safety-related source of randomization (SoR), that plays a key role in the platform-level randomization needed by MBPTA. And (iii) we evaluate current certification guidance with respect to emerging high performance design trends like caches. Overall, this Thesis pushes the certification limits in the use of multicore and MBPTA technology in Critical Real-Time Embedded Systems (CRTES) and paves the way towards their adoption in industry.Una creciente variedad de sistemas emergentes reemplazan o aumentan la funcionalidad de subsistemas mecánicos con componentes electrónicos embebidos. El aumento en la cantidad y complejidad de dichos subsistemas electrónicos así como su cometido, hacen de su seguridad una cuestión de creciente importancia. Tanto es así que la comercialización de estos sistemas críticos está sujeta a rigurosos procesos de certificación donde se garantiza la seguridad del sistema mediante estrictas restricciones en el proceso de desarrollo y diseño de su hardware y software. Esta tesis trata de abordar los nuevos retos y dificultades dadas por la introducción de procesadores multi-núcleo en dichos sistemas críticos: aunque su mayor rendimiento despierta el interés de la industria para integrar múltiples aplicaciones en una sola plataforma, suponen una mayor complejidad. Su arquitectura desafía su análisis temporal mediante los métodos tradicionales y, asimismo, su certificación es cada vez más compleja y costosa. Con el fin de lidiar con estas limitaciones, recientemente se ha desarrollado una novedosa técnica de análisis temporal probabilístico basado en medidas (MBPTA). La innovación de esta técnica, sin embargo, supone un gran cambio cultural respecto a los estándares y procedimientos tradicionales de certificación. En esta línea, las contribuciones de esta tesis están agrupadas en tres ejes principales: (i) definición de argumentos de seguridad para la certificación de aplicaciones de criticidad-mixta sobre plataformas multi-núcleo. Se definen, en particular, mecanismos de seguridad, técnicas de diagnóstico y reacción de faltas acorde con el estándar IEC 61508 sobre una arquitectura multi-núcleo de referencia. Respecto al análisis temporal, (ii) presentamos la cuantificación de la probabilidad de exceder un límite temporal y su relación con los requisitos de reducción de riesgos derivados de los estándares de seguridad funcional. Con este fin, nos basamos en la técnica MBPTA y presentamos el diseño de una fuente de números aleatorios segura; un componente clave para conseguir las propiedades aleatorias requeridas por MBPTA a nivel de plataforma. Por último, (iii) extrapolamos las guías actuales para la certificación de arquitecturas multi-núcleo a una solución comercial de 8 núcleos y las evaluamos con respecto a las tendencias emergentes de diseño de alto rendimiento (caches). Con estas contribuciones, esta tesis trata de abordar los retos que el uso de procesadores multi-núcleo y MBPTA implican en el proceso de certificación de sistemas críticos de tiempo real y facilita, de esta forma, su adopción por la industria.Postprint (published version
GPU devices for safety-critical systems: a survey
Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft
Real-Time I/O System for Many-core Embedded Systems
In modern real-time embedded systems, time predictability is vital. This extends to I/O operations which require predictability, timing-accuracy, enhanced performance, scalability, parallel access and isolation. Currently, existing approaches cannot achieve all these requirements at the same time. In this thesis, we propose a framework of hardware-implemented real-time I/O virtualization system to meet all these requirements simultaneously, termed BlueIO.
BlueIO integrates the important functionalities of I/O virtualization and low layer I/O drivers (achieved via Virtualized Complicated Device Controller (VCDC)), as well as a clock cycle level timing-accurate I/O controller (i.e. GPIO Command Processor (GPIOCP)). BlueIO provides this functionality in the hardware layer, supporting abstract virtualized access to I/O devices from the software domain. The hardware implementation includes I/O virtualization and I/O drivers provide isolation and parallel (concurrent) access to I/O operations and improves I/O performance. Furthermore, the approach includes GPIOCP to guarantee that I/O operations will occur at a specific clock cycle (i.e. be timing-accurate and predictable).
This thesis proposes the design and implementation of BlueIO, together with its components — GPIOCP and VCDC. It is demonstrated how a BlueIO-based system can be exploited to meet real-time requirements with significant improvements in I/O performance and low running cost on different OSs. The thesis presents a hardware consumption analysis of BlueIO, in order to show that it linearly scales with the number of CPUs and I/O devices, evidenced by the implementation which targets both FPGA and VLSI.
Finally, the thesis proposes a scalable real-time hardware hypervisor termed BlueVisor, which is built upon GPIOCP, VCDC and BlueIO. BlueVisor enables predictable virtualization on CPU, memory, and I/O; together with fast interrupt handling and inter-virtual machine communication. BlueVisor shows that the approaches towards I/O proposed in this thesis can be applied and expanded to different architectures and platforms, whilst maintaining real-time properties, performance and protection
Self-Aware Scheduling for Mixed-Criticality Component-Based Systems
A basic mixed-criticality requirement in real-time systems is temporal isolation, which ensures that applications receive a guaranteed (CPU) service and impose a bounded interference on other applications. Providing operating system support for temporal isolation is often inefficient, in terms of utilisation and achieved latencies, or complex and hard to implement or model correctly. Correct models are, however, a prerequisite when response times are bounded by formal analyses. We provide a novel approach to this challenge by applying self-aware computing methodologies that involve run-time monitoring to detect (and correct) model deviations of a budget-based scheduler
Recommended from our members
Provenance-based computing
Relying on computing systems that become increasingly complex is difficult:
with many factors potentially affecting the result of a computation or its
properties, understanding where problems appear and fixing them is a
challenging proposition. Typically, the process of finding solutions is driven
by trial and error or by experience-based insights.
In this dissertation, I examine the idea of using provenance metadata (the set
of elements that have contributed to the existence of a piece of data, together
with their relationships) instead. I show that considering provenance a
primitive of computation enables the exploration of system behaviour, targeting
both retrospective analysis (root cause analysis, performance tuning) and
hypothetical scenarios (what-if questions). In this context, provenance can be
used as part of feedback loops, with a double purpose: building software that
is able to adapt for meeting certain quality and performance targets
(semi-automated tuning) and enabling human operators to exert high-level
runtime control with limited previous knowledge of a system's internal architecture.
My contributions towards this goal are threefold: providing low-level
mechanisms for meaningful provenance collection considering OS-level resource
multiplexing, proving that such provenance data can be used in inferences about
application behaviour and generalising this to a set of primitives necessary for
fine-grained provenance disclosure in a wider context.
To derive such primitives in a bottom-up manner, I first present Resourceful, a
framework that enables capturing OS-level measurements in the context of
application activities. It is the contextualisation that allows tying the
measurements to provenance in a meaningful way, and I look at a number of
use-cases in understanding application performance. This also provides a good
setup for evaluating the impact and overheads of fine-grained provenance
collection.
I then show that the collected data enables new ways of understanding
performance variation by attributing it to specific components within a
system. The resulting set of tools, Soroban, gives developers and operation
engineers a principled way of examining the impact of various configuration, OS and virtualization parameters on application behaviour.
Finally, I consider how this supports the idea that provenance should be
disclosed at application level and discuss why such disclosure is necessary for
enabling the use of collected metadata efficiently and at a granularity which
is meaningful in relation to application semantics.CHESS Scholarship Scheme
EPSR
- …