1,347 research outputs found
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview
Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis
Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions
Advanced flight control system study
The architecture, requirements, and system elements of an ultrareliable, advanced flight control system are described. The basic criteria are functional reliability of 10 to the minus 10 power/hour of flight and only 6 month scheduled maintenance. A distributed system architecture is described, including a multiplexed communication system, reliable bus controller, the use of skewed sensor arrays, and actuator interfaces. Test bed and flight evaluation program are proposed
Advancing automation and robotics technology for the Space Station Freedom and for the US economy
The progress made by levels 1, 2, and 3 of the Office of Space Station in developing and applying advanced automation and robotics technology is described. Emphasis is placed upon the Space Station Freedom Program responses to specific recommendations made in the Advanced Technology Advisory Committee (ATAC) progress report 10, the flight telerobotic servicer, and the Advanced Development Program. Assessments are presented for these and other areas as they apply to the advancement of automation and robotics technology for the Space Station Freedom
Byzantine fault-tolerant agreement protocols for wireless Ad hoc networks
Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2010.The thesis investigates the problem of fault- and intrusion-tolerant consensus
in resource-constrained wireless ad hoc networks. This is a fundamental
problem in distributed computing because it abstracts the need
to coordinate activities among various nodes. It has been shown to be a
building block for several other important distributed computing problems
like state-machine replication and atomic broadcast.
The thesis begins by making a thorough performance assessment of existing
intrusion-tolerant consensus protocols, which shows that the performance
bottlenecks of current solutions are in part related to their system
modeling assumptions. Based on these results, the communication failure
model is identified as a model that simultaneously captures the reality
of wireless ad hoc networks and allows the design of efficient protocols.
Unfortunately, the model is subject to an impossibility result stating that
there is no deterministic algorithm that allows n nodes to reach agreement
if more than n2 omission transmission failures can occur in a communication
step. This result is valid even under strict timing assumptions (i.e.,
a synchronous system).
The thesis applies randomization techniques in increasingly weaker variants
of this model, until an efficient intrusion-tolerant consensus protocol
is achieved. The first variant simplifies the problem by restricting the
number of nodes that may be at the source of a transmission failure at
each communication step. An algorithm is designed that tolerates f dynamic
nodes at the source of faulty transmissions in a system with a total
of n 3f + 1 nodes.
The second variant imposes no restrictions on the pattern of transmission
failures. The proposed algorithm effectively circumvents the Santoro-
Widmayer impossibility result for the first time. It allows k out of n nodes
to decide despite dn
2 e(nk)+k2 omission failures per communication
step. This algorithm also has the interesting property of guaranteeing
safety during arbitrary periods of unrestricted message loss.
The final variant shares the same properties of the previous one, but relaxes
the model in the sense that the system is asynchronous and that a
static subset of nodes may be malicious. The obtained algorithm, called
Turquois, admits f < n
3 malicious nodes, and ensures progress in communication
steps where dnf
2 e(n k f) + k 2. The algorithm is
subject to a comparative performance evaluation against other intrusiontolerant
protocols. The results show that, as the system scales, Turquois
outperforms the other protocols by more than an order of magnitude.Esta tese investiga o problema do consenso tolerante a faltas acidentais
e maliciosas em redes ad hoc sem fios. Trata-se de um problema fundamental
que captura a essência da coordenação em actividades envolvendo
vários nós de um sistema, sendo um bloco construtor de outros importantes
problemas dos sistemas distribuÃdos como a replicação de máquina
de estados ou a difusão atómica.
A tese começa por efectuar uma avaliação de desempenho a protocolos
tolerantes a intrusões já existentes na literatura. Os resultados mostram
que as limitações de desempenho das soluções existentes estão em parte
relacionadas com o seu modelo de sistema. Baseado nestes resultados, é
identificado o modelo de falhas de comunicação como um modelo que simultaneamente
permite capturar o ambiente das redes ad hoc sem fios e
projectar protocolos eficientes. Todavia, o modelo é restrito por um resultado
de impossibilidade que afirma não existir algoritmo algum que permita
a n nós chegaram a acordo num sistema que admita mais do que n2
transmissões omissas num dado passo de comunicação. Este resultado é
válido mesmo sob fortes hipóteses temporais (i.e., em sistemas sÃncronos)
A tese aplica técnicas de aleatoriedade em variantes progressivamente
mais fracas do modelo até ser alcançado um protocolo eficiente e tolerante
a intrusões. A primeira variante do modelo, de forma a simplificar
o problema, restringe o número de nós que estão na origem de transmissões
faltosas. É apresentado um algoritmo que tolera f nós dinâmicos na
origem de transmissões faltosas em sistemas com um total de n 3f + 1
nós.
A segunda variante do modelo não impõe quaisquer restrições no padrão
de transmissões faltosas. É apresentado um algoritmo que contorna efectivamente
o resultado de impossibilidade Santoro-Widmayer pela primeira
vez e que permite a k de n nós efectuarem progresso nos passos de comunicação
em que o número de transmissões omissas seja dn
2 e(n
k) + k 2. O algoritmo possui ainda a interessante propriedade de tolerar
perÃodos arbitrários em que o número de transmissões omissas seja
superior a .
A última variante do modelo partilha das mesmas caracterÃsticas da variante
anterior, mas com pressupostos mais fracos sobre o sistema. Em particular,
assume-se que o sistema é assÃncrono e que um subconjunto estático
dos nós pode ser malicioso. O algoritmo apresentado, denominado
Turquois, admite f < n
3 nós maliciosos e assegura progresso nos passos
de comunicação em que dnf
2 e(n k f) + k 2. O algoritmo é
sujeito a uma análise de desempenho comparativa com outros protocolos
na literatura. Os resultados demonstram que, à medida que o número de
nós no sistema aumenta, o desempenho do protocolo Turquois ultrapassa
os restantes em mais do que uma ordem de magnitude.FC
Technology for the Future: In-Space Technology Experiments Program, part 2
The purpose of the Office of Aeronautics and Space Technology (OAST) In-Space Technology Experiments Program In-STEP 1988 Workshop was to identify and prioritize technologies that are critical for future national space programs and require validation in the space environment, and review current NASA (In-Reach) and industry/ university (Out-Reach) experiments. A prioritized list of the critical technology needs was developed for the following eight disciplines: structures; environmental effects; power systems and thermal management; fluid management and propulsion systems; automation and robotics; sensors and information systems; in-space systems; and humans in space. This is part two of two parts and contains the critical technology presentations for the eight theme elements and a summary listing of critical space technology needs for each theme
Advanced information processing system: Authentication protocols for network communication
In safety critical I/O and intercomputer communication networks, reliable message transmission is an important concern. Difficulties of communication and fault identification in networks arise primarily because the sender of a transmission cannot be identified with certainty, an intermediate node can corrupt a message without certainty of detection, and a babbling node cannot be identified and silenced without lengthy diagnosis and reconfiguration . Authentication protocols use digital signature techniques to verify the authenticity of messages with high probability. Such protocols appear to provide an efficient solution to many of these problems. The objective of this program is to develop, demonstrate, and evaluate intercomputer communication architectures which employ authentication. As a context for the evaluation, the authentication protocol-based communication concept was demonstrated under this program by hosting a real-time flight critical guidance, navigation and control algorithm on a distributed, heterogeneous, mixed redundancy system of workstations and embedded fault-tolerant computers
A fault-tolerant multiprocessor architecture for aircraft, volume 1
A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed
Tolerância a falhas em sistemas de comunicação de tempo-real flexÃveis
Nas últimas décadas, os sistemas embutidos distribuÃdos, têm sido usados em
variados domÃnios de aplicação, desde o controlo de processos industriais até
ao controlo de aviões e automóveis, sendo expectável que esta tendência se
mantenha e até se intensifique durante os próximos anos.
Os requisitos de confiabilidade de algumas destas aplicações são
extremamente importantes, visto que o não cumprimento de serviços de uma
forma previsÃvel e pontual pode causar graves danos económicos ou até pôr
em risco vidas humanas.
A adopção das melhores práticas de projecto no desenvolvimento destes
sistemas não elimina, por si só, a ocorrência de falhas causadas pelo
comportamento não determinÃstico do ambiente onde o sistema embutido
distribuÃdo operará. Desta forma, é necessário incluir mecanismos de
tolerância a falhas que impeçam que eventuais falhas possam comprometer
todo o sistema.
Contudo, para serem eficazes, os mecanismos de tolerância a falhas
necessitam ter conhecimento a priori do comportamento correcto do sistema
de modo a poderem ser capazes de distinguir os modos correctos de
funcionamento dos incorrectos.
Tradicionalmente, quando se projectam mecanismos de tolerância a falhas, o
conhecimento a priori significa que todos os possÃveis modos de
funcionamento são conhecidos na fase de projecto, não os podendo adaptar
nem fazer evoluir durante a operação do sistema. Como consequência, os
sistemas projectados de acordo com este princÃpio ou são completamente
estáticos ou permitem apenas um pequeno número de modos de operação.
Contudo, é desejável que os sistemas disponham de alguma flexibilidade de
modo a suportarem a evolução dos requisitos durante a fase de operação,
simplificar a manutenção e reparação, bem como melhorar a eficiência usando
apenas os recursos do sistema que são efectivamente necessários em cada
instante. Além disto, esta eficiência pode ter um impacto positivo no custo do
sistema, em virtude deste poder disponibilizar mais funcionalidades com o
mesmo custo ou a mesma funcionalidade a um menor custo.
Porém, flexibilidade e confiabilidade têm sido encarados como conceitos
conflituais.
Isto deve-se ao facto de flexibilidade implicar a capacidade de permitir a
evolução dos requisitos que, por sua vez, podem levar a cenários de operação
imprevisÃveis e possivelmente inseguros. Desta fora, é comummente aceite
que apenas um sistema completamente estático pode ser tornado confiável, o
que significa que todos os aspectos operacionais têm de ser completamente
definidos durante a fase de projecto.
Num sentido lato, esta constatação é verdadeira. Contudo, se os modos como
o sistema se adapta a requisitos evolutivos puderem ser restringidos e
controlados, então talvez seja possÃvel garantir a confiabilidade permanente
apesar das alterações aos requisitos durante a fase de operação.
A tese suportada por esta dissertação defende que é possÃvel flexibilizar um
sistema, dentro de limites bem definidos, sem comprometer a sua
confiabilidade e propõe alguns mecanismos que permitem a construção de
sistemas de segurança crÃtica baseados no protocolo Controller Area Network
(CAN). Mais concretamente, o foco principal deste trabalho incide sobre o
protocolo Flexible Time-Triggered CAN (FTT-CAN), que foi especialmente
desenvolvido para disponibilizar um grande nÃvel de flexibilidade operacional
combinando, não só as vantagens dos paradigmas de transmissão de
mensagens baseados em eventos e em tempo, mas também a flexibilidade
associada ao escalonamento dinâmico do tráfego cuja transmissão é
despoletada apenas pela evolução do tempo.
Este facto condiciona e torna mais complexo o desenvolvimento de
mecanismos de tolerância a falhas para FTT-CAN do que para outros
protocolos como por exemplo, TTCAN ou FlexRay, nos quais existe um
conhecimento estático, antecipado e comum a todos os nodos, do
escalonamento de mensagens cuja transmissão é despoletada pela evolução
do tempo.
Contudo, e apesar desta complexidade adicional, este trabalho demonstra que
é possÃvel construir mecanismos de tolerância a falhas para FTT-CAN
preservando a sua flexibilidade operacional.
É também defendido nesta dissertação que um sistema baseado no protocolo
FTT-CAN e equipado com os mecanismos de tolerância a falhas propostos é
passÃvel de ser usado em aplicações de segurança crÃtica.
Esta afirmação é suportada, no âmbito do protocolo FTT-CAN, através da
definição de uma arquitectura tolerante a falhas integrando nodos com modos
de falha tipo falha-silêncio e nodos mestre replicados.
Os vários problemas resultantes da replicação dos nodos mestre são, também
eles, analisados e várias soluções são propostas para os obviar.
Concretamente, é proposto um protocolo que garante a consistência das
estruturas de dados replicadas a quando da sua actualização e um outro
protocolo que permite a transferência dessas estruturas de dados para um
nodo mestre que se encontre não sincronizado com os restantes depois de
inicializado ou reinicializado de modo assÃncrono.
Além disto, esta dissertação também discute o projecto de nodos FTT-CAN
que exibam um modo de falha do tipo falha-silêncio e propõe duas soluções
baseadas em componentes de hardware localizados no interface de rede de
cada nodo, para resolver este problema. Uma das soluções propostas baseiase
em bus guardians que permitem a imposição de comportamento falhasilêncio
nos nodos escravos e suportam o escalonamento dinâmico de tráfego
na rede. A outra solução baseia-se num interface de rede que arbitra o acesso
de dois microprocessadores ao barramento. Este interface permite que a
replicação interna de um nodo seja efectuada de forma transparente e
assegura um comportamento falha-silêncio quer no domÃnio temporal quer no
domÃnio do valor ao permitir transmissões do nodo apenas quando ambas as
réplicas coincidam no conteúdo das mensagens e nos instantes de
transmissão. Esta última solução está mais adaptada para ser usada nos
nodos mestre, contudo também poderá ser usada nos nodos escravo, sempre
que tal se revele fundamental.Distributed embedded systems (DES) have been widely used in the last few
decades in several application fields, ranging from industrial process control to
avionics and automotive systems. In fact, it is expectable that this trend will
continue over the years to come.
In some of these application domains the dependability requirements are of
utmost importance since failing to provide services in a timely and predictable
manner may cause important economic losses or even put human life in risk.
The adoption of the best practices in the design of distributed embedded
systems does not fully avoid the occurrence of faults, arising from the nondeterministic
behavior of the environment where each particular DES operates.
Thus, fault-tolerance mechanisms need to be included in the DES to prevent
possible faults leading to system failure.
To be effective, fault-tolerance mechanisms require an a priori knowledge of
the correct system behavior to be capable of distinguishing them from the
erroneous ones.
Traditionally, when designing fault-tolerance mechanisms, the a priori
knowledge means that all possible operational modes are known at system
design time and cannot adapt nor evolve during runtime. As a consequence,
systems designed according to this principle are either fully static or allow a
small number of operational modes only. Flexibility, however, is a desired
property in a system in order to support evolving requirements, simplify
maintenance and repair, and improve the efficiency in using system resources
by using only the resources that are effectively required at each instant. This
efficiency might impact positively on the system cost because with the same
resources one can add more functionality or one can offer the same
functionality with fewer resources.
However, flexibility and dependability are often regarded as conflicting
concepts. This is so because flexibility implies the ability to deal with evolving
requirements that, in turn, can lead to unpredictable and possibly unsafe
operating scenarios. Therefore, it is commonly accepted that only a fully static
system can be made dependable, meaning that all operating conditions are
completely defined at pre-runtime.
In the broad sense and assuming unbounded flexibility this assessment is true,
but if one restricts and controls the ways the system could adapt to evolving
requirements, then it might be possible to enforce continuous dependability.
This thesis claims that it is possible to provide a bounded degree of flexibility
without compromising dependability and proposes some mechanisms to build
safety-critical systems based on the Controller Area Network (CAN).
In particular, the main focus of this work is the Flexible Time-Triggered CAN
protocol (FTT-CAN), which was specifically developed to provide such high
level of operational flexibility, not only combining the advantages of time- and
event-triggered paradigms but also providing flexibility to the time-triggered
traffic. This fact makes the development of fault-tolerant mechanisms more
complex in FTT-CAN than in other protocols, such as TTCAN or FlexRay, in
which there is a priori static common knowledge of the time-triggered message
schedule shared by all nodes. Nevertheless, as it is demonstrated in this work,
it is possible to build fault-tolerant mechanisms for FTT-CAN that preserve its
high level of operational flexibility, particularly concerning the time-triggered
traffic. With such mechanisms it is argued that FTT-CAN is suitable for safetycritical
applications, too.
This claim was validated in the scope of the FTT-CAN protocol by presenting a
fault-tolerant system architecture with replicated masters and fail-silent nodes.
The specific problems and mechanisms related with master replication,
particularly a protocol to enforce consistency during updates of replicated data
structures and another protocol to transfer these data structures to an
unsynchronized node upon asynchronous startup or restart, are also
addressed.
Moreover, this thesis also discusses the implementations of fail-silence in FTTCAN
nodes and proposes two solutions, both based on hardware components
that are attached to the node network interface. One solution relies on bus
guardians that allow enforcing fail-silence in the time domain. These bus
guardians are adapted to support dynamic traffic scheduling and are fit for use
in FTT-CAN slave nodes, only. The other solution relies on a special network
interface, with duplicated microprocessor interface, that supports internal
replication of the node, transparently. In this case, fail-silence can be assured
both in the time and value domain since transmissions are carried out only if
both internal nodes agree on the transmission instant and message contents.
This solution is well adapted for use in the masters but it can also be used, if
desired, in slave nodes
Ordering, timeliness and reliability for publish/subscribe systems over WAN
In the last few years, the increasing use of the Internet and geo-political, sociological and financial changes induced by globalization, are paving the way for a connected world where the information is always available at the right place and the right time. As such, applications previously deployed for ``closed'' environmets, are now federating into geographically distributed systems connected through a Wide Area Network (WAN). By this evolution, in the near future no system will be isolated: every system will be composed by interconnected systems, i.e., it will be a System of Systems (SoS). Example of SoS are the Large-scale Complex Critical Infrastructure (LCCIs), such as power grids, transport infrastructures (airports and seaports), financial infrastructures, next generation intelligence platforms, to cite a few. In these systems, multiple sources of information generate a high volume of events that need to be delivered to all intended destinations by respecting several Quality of Service (QoS) constraints imposed by the critical nature of LCCIs. As such, particular attention is devoted to the middleware solution used to disseminate information in the SoS. Due to its inherently scalability provided by space, time and synchronization decoupling properties, the publish/subscribe paradigm is becoming attractive for the implementation of a middleware service for LCCIs. However, scalability is not the only requirement exhibited by SoS. Several services need to control a broader set of QoS requirements, such as timeliness, ordering and reliability. Unfortunately, current middleware solutions do not address QoS constraints required by SoS. Current publish/subscribe middleware solutions for the WAN environment offer only a best effort event dissemination, with no additional control on QoS. Just a few implementations try to address some isolated QoS policy, making them not suitable for a SoS scenario. The contribution of this thesis is to devise a QoS layer that can be posed on top of a generic publish/subscribe middleware that enriches its service by addressing: (i) ordering, (ii) reliability and (iii) timeliness in event dissemination in SoS over WAN. Specifically, we first analyze several real case studies, by highlighting their QoS requirements in terms of ordering, reliability and timeliness, and compare these requirements with both current research prototypes and commercial systems. Then, we fill the gap by proposing novel algorithms to address those requirements. The proposed protocols can also be combined together in order to provide the QoS level required by the particular application. In this way, QoS issues do not need to be addressed at application level, so as to leave applications to implement just their native functionalities
- …