146 research outputs found
Engineering real-time applications with WorldFIP: tools and analysis
WorldFIP is standardised as European Norm EN 50170 - General Purpose
Field Communication System. Field communication systems (fieldbuses) started to be
widely used as the communication support for distributed computer-controlled systems
(DCCS), and are being used in all sorts of process control and manufacturing
applications within different types of industries. There are several advantages in using
fieldbuses as a replacement of for the traditional point-to-point links between
sensors/actuators and computer-based control systems. Indeed they concern economical
ones (cable savings) but, importantly, fieldbuses allow an increased decentralisation
and distribution of the processing power over the field. Typically DCCS have real-time
requirements that must be fulfilled. By this, we mean that process data must be
transferred between network computing nodes within a maximum admissible time span.
WorldFIP has very interesting mechanisms to schedule data transfers. It explicit
distinguishes to types of traffic: periodic and aperiodic. In this paper we describe how
WorldFIP handles these two types of traffic, and more importantly, we provide a
comprehensive analysis for guaranteeing the real-time requirements of both types of
traffic. A major contribution is made in the analysis of worst-case response time of
aperiodic transfer requests
CSP channels for CAN-bus connected embedded control systems
Closed loop control system typically contains multitude of sensors and actuators operated simultaneously. So they are parallel and distributed in its essence. But when mapping this parallelism to software, lot of obstacles concerning multithreading communication and synchronization issues arise. To overcome this problem, the CT kernel/library based on CSP algebra has been developed. This project (TES.5410) is about developing communication extension to the CT library to make it applicable in distributed systems. Since the library is tailored for control systems, properties and requirements of control systems are taken into special consideration. Applicability of existing middleware solutions is examined. A comparison of applicable fieldbus protocols is done in order to determine most suitable ones and CAN fieldbus is chosen to be first fieldbus used. Brief overview of CSP and existing CSP based libraries is given. Middleware architecture is proposed along with few novel ideas
Flexible time-triggered protocol for CAN: new scheduling and dispatching solutions
One of the possibilities to build robust communication systems with respect to their temporal behaviour is to use autonomous control based on the time-triggered paradigm. The FTT-CAN - flexible time-triggered protocol, relies on centralised scheduling but makes use of the CAN native distributed arbitration to reduce communication overhead. There, a planning scheduler is used within a master node to reduce the scheduling run-time overhead. On-line changes to the communication requirements can then be made under guaranteed timeliness. In addition FTT-CAN also allows an efficient combination of both time-triggered and event- triggered traffic with temporal isolation.
In this paper, recent evolutions of the initial protocol definition concerning transmission of synchronous and asynchronous messages are presented. These consist in a time division of the elementary transmission window which optimises the available bandwidth for asynchronous messages, keeping the timeliness of synchronous messages without jeopardising their transmission jitter. A novel solution for the planning scheduler is also presented. It consists in an FPGA-based coprocessor which implements the planning scheduler technique without imposing overhead to the arbiter CPU. With it, it is possible to reduce strongly the plan duration thus allowing on-line admission demanded by system elements and, also, to extend the protocol application to high-speed networks
Controller Area Network
Controller Area Network (CAN) is a popular and very well-known bus system, both in academia and in industry. CAN protocol was introduced in the mid eighties by Robert Bosch GmbH [7] and it was internationally standardized in 1993 as ISO 11898-1 [24]. It was initially designed to distributed automotive control systems, as a single digital bus to replace traditional point-to-point cables that were growing in complexity, weight and cost with the introduction of new electrical and electronic systems. Nowadays CAN is still used extensively in automotive applications, with an excess of 400 million CAN enabled microcontrollers manufactured each year [14].
The widespread and successful use of CAN in the automotive industry, the low cost asso- ciated with high volume production of controllers and CAN's inherent technical merit, have driven to CAN adoption in other application domains such as: industrial communications,
medical equipment, machine tool, robotics and in distributed embedded systems in general.
CAN provides two layers of the stack of the Open Systems Interconnection (OSI) reference model: the physical layer and the data link layer. Optionally, it could also provide an additional application layer, not included on the CAN standard. Notice that CAN physical layer was not dened in Bosch original specication, only the data link layer was dened. However, the CAN ISO specication lled this gap and the physical layer was then fully specied. CAN is a message-oriented transmission protocol, i.e., it denes message contents rather than nodes and node addresses. Every message has an associated message identier, which is unique within the whole network, dening both the content and the priority of the message. Transmission rates are dened up to 1 Mbps.
The large installed base of CAN nodes with low failure rates over almost two decades, led to the use of CAN in some critical applications such as Anti-locking Brake Systems (ABS) and Electronic Stability Program (ESP) in cars. In parallel with the wide dissemination of CAN in industry, the academia also devoted a large eort to CAN analysis and research, making CAN one of the must studied eldbuses. That is why a large number of books or book chapters describing CAN were published. The rst CAN book, written in French by D. Paret, was published in 1997 and presents the CAN basics [32]. More implementation oriented approaches, including CAN node implementation and application examples, can be found in Lorenz [28] and in Etschberger [16], while more compact descriptions of CAN can be found in [11] and in some chapters of [31].
Despite its success story, CAN application designers would be happier if CAN could be made faster, cover longer distances, be more deterministic and more dependable [34]. Over the years, several protocols based in CAN were presented, taking advantage of some CAN properties and trying to improve some known CAN drawbacks. This chapter, besides presenting an overview of CAN, describes also some other relevant higher level protocols based on CAN, such as CANopen [13], DeviceNet [6], FTT-CAN [1] and TTCAN [25]
Analysis, evaluation and improvement of RT-WMP for real-time and QoS wireless communication: Applications in confined environments
En los ultimos años, la innovaciĂłn tecnolĂłgica, la caracterĂstica de flexibilidad y el rápido despligue de las redes inalámbricas, han favorecido la difusiĂłn de la redes mĂłviles ad-hoc (MANETs), capaces de ofrecer servicios para tareas especĂficas entre nodos mĂłviles. Los aspectos relacionados al dinamismo de la topologĂa mĂłvil y el acceso a un medio compartido por naturaleza hacen que sea preciso enfrentarse a clases de problemas distintos de los relacionados con la redes cableadas, atrayendo de este modo el interĂ©s de la comunidad cientĂfica. Las redes ad-hoc suelen soportar tráfico con garantĂa de servicio mĂnimo y la mayor parte de las propuestas presentes en literatura tratan de dar garantĂas de ancho de banda o minimizar el retardo de los mensajes. Sin embargo hay situaciones en las que estas garantĂas no son suficientes. Este es el caso de los sistemas que requieren garantĂas mas fuertes en la entrega de los mensajes, como es el caso de los sistemas de tiempo real donde la pĂ©rdida o el retraso de un sĂłlo mensaje puede provocar problemas graves. Otras aplicaciones como la videoconferencia, cada vez más extendidas, implican un tráfico de datos con requisitos diferentes, como la calidad de servicio (QoS). Los requisitos de tiempo real y de QoS añaden nuevos retos al ya exigente servicio de comunicaciĂłn inalámbrica entre estaciones mĂłviles de una MANET. Además, hay aplicaciones en las que hay que tener en cuenta algo más que el simple encaminamiento de los mensajes. Este es el caso de aplicaciones en entornos subterráneos, donde el conocimiento de la evoluciĂłn de propagaciĂłn de la señal entre los diferentes nodos puede ser Ăştil para mejorar la calidad de servicio y mantener la conectividad en cada momento. A pesar de Ă©sto, dentro del amplio abanicos de propuestas presente en la literatura, existen un conjunto de limitaciones que van de el mero uso de protocolos simulados a propuestas que no tienen en cuenta entornos no convencionales o que resultan aisladas desde el punto de vista de la integraciĂłn en sistemas complejos. En esta tesis doctoral, se propone un estudio completo sobre un plataforma inalámbrica de tiempo real, utilizando el protocolo RT-WMP capaz de gestionar trafĂco multimedia al mismo tiempo y adaptado al entorno de trabajo. Se propone una extensiĂłn para el soporte a los datos con calidad de servicio sin limitar las caractaristĂcas temporales del protocolo básico. Y con el fin de tener en cuenta el efecto de la propagaciĂłn de la señal, se caracteriza el entorno por medio de un conjunto de restricciones de conectividad. La soluciĂłn ha sido desarrollada y su validez ha sido demostrada extensamente en aplicaciones reales en entornos subterráneos, en redes malladas y aplicaciones robĂłticas
Schedulability analysis and optimization of time-partitioned distributed real-time systems
RESUMEN: La creciente complejidad de los sistemas de control modernos lleva a muchas empresas a tener que re-dimensionar o re-diseñar sus soluciones para adecuarlas a nuevas funcionalidades y requisitos. Un caso paradigmático de esta situaciĂłn se ha dado en el sector ferroviario, donde la implementaciĂłn de las aplicaciones de señalizaciĂłn se ha llevado a cabo empleando tĂ©cnicas tradicionales que, si bien ahora mismo cumplen con los requisitos básicos, su rendimiento temporal y escalabilidad funcional son sustancialmente mejorables. A partir de las soluciones propuestas en esta tesis, además de contribuir a la validaciĂłn de sistemas que requieren certificaciĂłn de seguridad funcional, tambiĂ©n se creará la tecnologĂa base de análisis de planificabilidad y optimizaciĂłn de sistemas de tiempo real distribuidos generales y tambiĂ©n basados en particionado temporal, que podrá ser aplicada en distintos entornos en los que los sistemas ciberfĂsicos juegan un rol clave, por ejemplo en aplicaciones de Industria 4.0, en los que pueden presentarse problemas similares en el futuro.ABSTRACT:he increasing complexity of modern control systems leads many companies to have to resize or redesign their solutions to adapt them to new functionalities and requirements. A paradigmatic case of this situation has occurred in the railway sector, where the implementation of signaling applications has been carried out using traditional techniques that, although they currently meet the basic requirements, their time performance and functional scalability can be substantially improved. From the solutions proposed in this thesis, besides contributing to the assessment of systems that require functional safety certification, the base technology for schedulability analysis and optimization of general as well as time-partitioned distributed real-time systems will be derived, which can be applied in different environments where cyber-physical systems play a key role, for example in Industry 4.0 applications, where similar problems may arise in the future
DEPENDABILITY BENCHMARKING OF NETWORK FUNCTION VIRTUALIZATION
Network Function Virtualization (NFV) is an emerging networking paradigm that aims to reduce costs and time-to-market, improve manageability, and foster competition and innovative services. NFV exploits virtualization and cloud computing technologies to turn physical network functions into Virtualized Network Functions (VNFs), which will be implemented in software, and will run as Virtual Machines (VMs) on commodity hardware located in high-performance data centers, namely Network Function Virtualization Infrastructures (NFVIs). The NFV paradigm relies on cloud computing and virtualization technologies to provide carrier-grade services, i.e., the ability of a service to be highly reliable and available, within fast and automatic failure recovery mechanisms.
The availability of many virtualization solutions for NFV poses the question on which virtualization technology should be adopted for NFV, in order to fulfill the requirements described above. Currently, there are limited solutions for analyzing, in quantitative terms, the performance and reliability trade-offs, which are important concerns for the adoption of NFV.
This thesis deals with assessment of the reliability and of the performance of NFV systems. It proposes a methodology, which includes context, measures, and faultloads, to conduct dependability benchmarks in NFV, according to the general principles of dependability benchmarking. To this aim, a fault injection framework for the virtualization technologies has been designed and implemented for the virtualized technologies being used as case studies in this thesis.
This framework is successfully used to conduct an extensive experimental campaign, where we compare two candidate virtualization technologies for NFV adoption: the commercial, hypervisor-based virtualization platform VMware vSphere, and the open-source, container-based virtualization platform Docker. These technologies are assessed in the context of a high-availability, NFV-oriented IP Multimedia Subsystem (IMS).
The analysis of experimental results reveal that i) fault management mechanisms are crucial in NFV, in order to provide accurate failure detection and start the subsequent failover actions, and ii) fault injection proves to be valuable way to introduce uncommon scenarios in the NFVI, which can be fundamental to provide a high reliable service in production
Worst-case delay analysis of real-time switched Ethernet networks with flow local synchronization
Les réseaux Ethernet commuté full-duplex constituent des solutions intéressantes pour des applications industrielles. Mais le non-déterminisme d’un commutateur IEEE 802.1d, fait que l’analyse pire cas de délai de flux critiques est encore un problème ouvert. Plusieurs méthodes ont été proposées pour obtenir des bornes supérieures des délais de communication sur des réseaux Ethernet commuté full duplex temps réels, faisant l’hypothèse que le trafic en entrée du réseau peut être borné. Le problème principal reste le pessimisme introduit par la méthode de calcul de cette borne supérieure du délai. Ces méthodes considèrent que tous les flux transmis sur le réseau sont indépendants. Ce qui est vrai pour les flux émis par des nœuds sources différents car il n’existe pas, dans le cas général, d’horloge globale permettant de synchroniser les flux. Mais pour les flux émis par un même nœud source, il est possible de faire l’hypothèse d’une synchronisation locale de ces flux. Une telle hypothèse permet de bâtir un modèle plus précis des flux et en conséquence élimine des scénarios impossibles qui augmentent le pessimisme du calcul. Le sujet principal de cette thèse est d’étudier comment des flux périodiques synchronisés par des offsets peuvent être gérés dans le calcul des bornes supérieures des délais sur un réseau Ethernet commuté temps-réel. Dans un premier temps, il s’agit de présenter l’impact des contraintes d’offsets sur le calcul des bornes supérieures des délais de bout en bout. Il s’agit ensuite de présenter comment intégrer ces contraintes d’offsets dans les approches de calcul basées sur le Network Calculus et la méthode des Trajectoires. Une méthode Calcul Réseau modifiée et une méthode Trajectoires modifiée sont alors développées et les performances obtenues sont comparées. Le réseau avionique AFDX (Avionics Full-Duplex Switched Ethernet) est pris comme exemple d’un réseau Ethernet commuté full-duplex. Une configuration AFDX industrielle avec un millier de flux est présentée. Cette configuration industrielle est alors évaluée à l’aide des deux approches, selon un choix d’allocation d’offsets donné. De plus, différents algorithmes d’allocation des offsets sont testés sur cette configuration industrielle, pour trouver un algorithme d’allocation quasi-optimal. Une analyse de pessimisme des bornes supérieures calculées est alors proposée. Cette analyse est basée sur l’approche des trajectoires (rendue optimiste) qui permet de calculer une sous-approximation du délai pire-cas. La différence entre la borne supérieure du délai (calculée par une méthode donnée) et la sous-approximation du délai pire cas donne une borne supérieure du pessimisme de la méthode. Cette analyse fournit des résultats intéressants sur le pessimisme des approches Calcul Réseau et méthode des Trajectoires. La dernière partie de la thèse porte sur une architecture de réseau temps réel hétérogène obtenue par connexion de réseaux CAN via des ponts sur un réseau fédérateur de type Ethernet commuté. Deux approches, une basée sur les composants et l’autre sur les Trajectoires sont proposées pour permettre une analyse des délais pire-cas sur un tel réseau. La capacité de calcul d’une borne supérieure des délais pire-cas dans le contexte d’une architecture hétérogène est intéressante pour les domaines industriels. ABSTRACT : Full-duplex switched Ethernet is a promising candidate for interconnecting real-time industrial applications. But due to IEEE 802.1d indeterminism, the worst-case delay analysis of critical flows supported by such a network is still an open problem. Several methods have been proposed for upper-bounding communication delays on a real-time switched Ethernet network, assuming that the incoming traffic can be upper bounded. The main problem remaining is to assess the tightness, i.e. the pessimism, of the method calculating this upper bound on the communication delay. These methods consider that all flows transmitted over the network are independent. This is true for flows emitted by different source nodes since, in general, there is no global clock synchronizing them. But the flows emitted by the same source node are local synchronized. Such an assumption helps to build a more precise flow model that eliminates some impossible communication scenarios which lead to a pessimistic delay upper bounds. The core of this thesis is to study how local periodic flows synchronized with offsets can be handled when computing delay upper-bounds on a real-time switched Ethernet. In a first step, the impact of these offsets on the delay upper-bound computation is illustrated. Then, the integration of offsets in the Network Calculus and the Trajectory approaches is introduced. Therefore, a modified Network Calculus approach and a modified Trajectory approach are developed whose performances are compared on an Avionics Full-DupleX switched Ethernet (AFDX) industrial configuration with one thousand of flows. It has been shown that, in the context of this AFDX configuration, the Trajectory approach leads to slightly tighter end-to-end delay upper bounds than the ones of the Network Calculus approach. But offsets of local flows have to be chosen. Different offset assignment algorithms are then investigated on the AFDX industrial configuration. A near-optimal assignment can be exhibited. Next, a pessimism analysis of the computed upper-bounds is proposed. This analysis is based on the Trajectory approach (made optimistic) which computes an under-estimation of the worst-case delay. The difference between the upper-bound (computed by a given method) and the under-estimation of the worst-case delay gives an upper-bound of the pessimism of the method. This analysis gives interesting comparison results on the Network Calculus and the Trajectory approaches pessimism. The last part of the thesis, deals with a real-time heterogeneous network architecture where CAN buses are interconnected through a switched Ethernet backbone using dedicated bridges. Two approaches, the component-based approach and the Trajectory approach, are developed to conduct a worst-case delay analysis for such a network. Clearly, the ability to compute end-to-end delays upper-bounds in the context of heterogeneous network architecture is promising for industrial domains
Tolerância a falhas em sistemas de comunicação de tempo-real flexĂveis
Nas Ăşltimas dĂ©cadas, os sistemas embutidos distribuĂdos, tĂŞm sido usados em
variados domĂnios de aplicação, desde o controlo de processos industriais atĂ©
ao controlo de aviões e automóveis, sendo expectável que esta tendência se
mantenha e até se intensifique durante os próximos anos.
Os requisitos de confiabilidade de algumas destas aplicações são
extremamente importantes, visto que o não cumprimento de serviços de uma
forma previsĂvel e pontual pode causar graves danos econĂłmicos ou atĂ© pĂ´r
em risco vidas humanas.
A adopção das melhores práticas de projecto no desenvolvimento destes
sistemas nĂŁo elimina, por si sĂł, a ocorrĂŞncia de falhas causadas pelo
comportamento nĂŁo determinĂstico do ambiente onde o sistema embutido
distribuĂdo operará. Desta forma, Ă© necessário incluir mecanismos de
tolerância a falhas que impeçam que eventuais falhas possam comprometer
todo o sistema.
Contudo, para serem eficazes, os mecanismos de tolerância a falhas
necessitam ter conhecimento a priori do comportamento correcto do sistema
de modo a poderem ser capazes de distinguir os modos correctos de
funcionamento dos incorrectos.
Tradicionalmente, quando se projectam mecanismos de tolerância a falhas, o
conhecimento a priori significa que todos os possĂveis modos de
funcionamento sĂŁo conhecidos na fase de projecto, nĂŁo os podendo adaptar
nem fazer evoluir durante a operação do sistema. Como consequência, os
sistemas projectados de acordo com este princĂpio ou sĂŁo completamente
estáticos ou permitem apenas um pequeno número de modos de operação.
Contudo, é desejável que os sistemas disponham de alguma flexibilidade de
modo a suportarem a evolução dos requisitos durante a fase de operação,
simplificar a manutenção e reparação, bem como melhorar a eficiência usando
apenas os recursos do sistema que são efectivamente necessários em cada
instante. Além disto, esta eficiência pode ter um impacto positivo no custo do
sistema, em virtude deste poder disponibilizar mais funcionalidades com o
mesmo custo ou a mesma funcionalidade a um menor custo.
Porém, flexibilidade e confiabilidade têm sido encarados como conceitos
conflituais.
Isto deve-se ao facto de flexibilidade implicar a capacidade de permitir a
evolução dos requisitos que, por sua vez, podem levar a cenários de operação
imprevisĂveis e possivelmente inseguros. Desta fora, Ă© comummente aceite
que apenas um sistema completamente estático pode ser tornado confiável, o
que significa que todos os aspectos operacionais tĂŞm de ser completamente
definidos durante a fase de projecto.
Num sentido lato, esta constatação é verdadeira. Contudo, se os modos como
o sistema se adapta a requisitos evolutivos puderem ser restringidos e
controlados, entĂŁo talvez seja possĂvel garantir a confiabilidade permanente
apesar das alterações aos requisitos durante a fase de operação.
A tese suportada por esta dissertação defende que Ă© possĂvel flexibilizar um
sistema, dentro de limites bem definidos, sem comprometer a sua
confiabilidade e propõe alguns mecanismos que permitem a construção de
sistemas de segurança crĂtica baseados no protocolo Controller Area Network
(CAN). Mais concretamente, o foco principal deste trabalho incide sobre o
protocolo Flexible Time-Triggered CAN (FTT-CAN), que foi especialmente
desenvolvido para disponibilizar um grande nĂvel de flexibilidade operacional
combinando, nĂŁo sĂł as vantagens dos paradigmas de transmissĂŁo de
mensagens baseados em eventos e em tempo, mas também a flexibilidade
associada ao escalonamento dinâmico do tráfego cuja transmissão é
despoletada apenas pela evolução do tempo.
Este facto condiciona e torna mais complexo o desenvolvimento de
mecanismos de tolerância a falhas para FTT-CAN do que para outros
protocolos como por exemplo, TTCAN ou FlexRay, nos quais existe um
conhecimento estático, antecipado e comum a todos os nodos, do
escalonamento de mensagens cuja transmissão é despoletada pela evolução
do tempo.
Contudo, e apesar desta complexidade adicional, este trabalho demonstra que
Ă© possĂvel construir mecanismos de tolerância a falhas para FTT-CAN
preservando a sua flexibilidade operacional.
É também defendido nesta dissertação que um sistema baseado no protocolo
FTT-CAN e equipado com os mecanismos de tolerância a falhas propostos é
passĂvel de ser usado em aplicações de segurança crĂtica.
Esta afirmação é suportada, no âmbito do protocolo FTT-CAN, através da
definição de uma arquitectura tolerante a falhas integrando nodos com modos
de falha tipo falha-silĂŞncio e nodos mestre replicados.
Os vários problemas resultantes da replicação dos nodos mestre são, também
eles, analisados e várias soluções são propostas para os obviar.
Concretamente, Ă© proposto um protocolo que garante a consistĂŞncia das
estruturas de dados replicadas a quando da sua actualização e um outro
protocolo que permite a transferĂŞncia dessas estruturas de dados para um
nodo mestre que se encontre nĂŁo sincronizado com os restantes depois de
inicializado ou reinicializado de modo assĂncrono.
Além disto, esta dissertação também discute o projecto de nodos FTT-CAN
que exibam um modo de falha do tipo falha-silêncio e propõe duas soluções
baseadas em componentes de hardware localizados no interface de rede de
cada nodo, para resolver este problema. Uma das soluções propostas baseiase
em bus guardians que permitem a imposição de comportamento falhasilêncio
nos nodos escravos e suportam o escalonamento dinâmico de tráfego
na rede. A outra solução baseia-se num interface de rede que arbitra o acesso
de dois microprocessadores ao barramento. Este interface permite que a
replicação interna de um nodo seja efectuada de forma transparente e
assegura um comportamento falha-silĂŞncio quer no domĂnio temporal quer no
domĂnio do valor ao permitir transmissões do nodo apenas quando ambas as
réplicas coincidam no conteúdo das mensagens e nos instantes de
transmissão. Esta última solução está mais adaptada para ser usada nos
nodos mestre, contudo também poderá ser usada nos nodos escravo, sempre
que tal se revele fundamental.Distributed embedded systems (DES) have been widely used in the last few
decades in several application fields, ranging from industrial process control to
avionics and automotive systems. In fact, it is expectable that this trend will
continue over the years to come.
In some of these application domains the dependability requirements are of
utmost importance since failing to provide services in a timely and predictable
manner may cause important economic losses or even put human life in risk.
The adoption of the best practices in the design of distributed embedded
systems does not fully avoid the occurrence of faults, arising from the nondeterministic
behavior of the environment where each particular DES operates.
Thus, fault-tolerance mechanisms need to be included in the DES to prevent
possible faults leading to system failure.
To be effective, fault-tolerance mechanisms require an a priori knowledge of
the correct system behavior to be capable of distinguishing them from the
erroneous ones.
Traditionally, when designing fault-tolerance mechanisms, the a priori
knowledge means that all possible operational modes are known at system
design time and cannot adapt nor evolve during runtime. As a consequence,
systems designed according to this principle are either fully static or allow a
small number of operational modes only. Flexibility, however, is a desired
property in a system in order to support evolving requirements, simplify
maintenance and repair, and improve the efficiency in using system resources
by using only the resources that are effectively required at each instant. This
efficiency might impact positively on the system cost because with the same
resources one can add more functionality or one can offer the same
functionality with fewer resources.
However, flexibility and dependability are often regarded as conflicting
concepts. This is so because flexibility implies the ability to deal with evolving
requirements that, in turn, can lead to unpredictable and possibly unsafe
operating scenarios. Therefore, it is commonly accepted that only a fully static
system can be made dependable, meaning that all operating conditions are
completely defined at pre-runtime.
In the broad sense and assuming unbounded flexibility this assessment is true,
but if one restricts and controls the ways the system could adapt to evolving
requirements, then it might be possible to enforce continuous dependability.
This thesis claims that it is possible to provide a bounded degree of flexibility
without compromising dependability and proposes some mechanisms to build
safety-critical systems based on the Controller Area Network (CAN).
In particular, the main focus of this work is the Flexible Time-Triggered CAN
protocol (FTT-CAN), which was specifically developed to provide such high
level of operational flexibility, not only combining the advantages of time- and
event-triggered paradigms but also providing flexibility to the time-triggered
traffic. This fact makes the development of fault-tolerant mechanisms more
complex in FTT-CAN than in other protocols, such as TTCAN or FlexRay, in
which there is a priori static common knowledge of the time-triggered message
schedule shared by all nodes. Nevertheless, as it is demonstrated in this work,
it is possible to build fault-tolerant mechanisms for FTT-CAN that preserve its
high level of operational flexibility, particularly concerning the time-triggered
traffic. With such mechanisms it is argued that FTT-CAN is suitable for safetycritical
applications, too.
This claim was validated in the scope of the FTT-CAN protocol by presenting a
fault-tolerant system architecture with replicated masters and fail-silent nodes.
The specific problems and mechanisms related with master replication,
particularly a protocol to enforce consistency during updates of replicated data
structures and another protocol to transfer these data structures to an
unsynchronized node upon asynchronous startup or restart, are also
addressed.
Moreover, this thesis also discusses the implementations of fail-silence in FTTCAN
nodes and proposes two solutions, both based on hardware components
that are attached to the node network interface. One solution relies on bus
guardians that allow enforcing fail-silence in the time domain. These bus
guardians are adapted to support dynamic traffic scheduling and are fit for use
in FTT-CAN slave nodes, only. The other solution relies on a special network
interface, with duplicated microprocessor interface, that supports internal
replication of the node, transparently. In this case, fail-silence can be assured
both in the time and value domain since transmissions are carried out only if
both internal nodes agree on the transmission instant and message contents.
This solution is well adapted for use in the masters but it can also be used, if
desired, in slave nodes
- …