52 research outputs found

    A metaobject architecture for fault-tolerant distributed systems : the FRIENDS approach

    Get PDF
    The FRIENDS system developed at LAAS-CNRS is a metalevel architecture providing libraries of metaobjects for fault tolerance, secure communication, and group-based distributed applications. The use of metaobjects provides a nice separation of concerns between mechanisms and applications. Metaobjects can be used transparently by applications and can be composed according to the needs of a given application, a given architecture, and its underlying properties. In FRIENDS, metaobjects are used recursively to add new properties to applications. They are designed using an object oriented design method and implemented on top of basic system services. This paper describes the FRIENDS software-based architecture, the object-oriented development of metaobjects, the experiments that we have done, and summarizes the advantages and drawbacks of a metaobject approach for building fault-tolerant system

    Patterns for building dependable systems with trusted bases

    Get PDF
    We propose a set of patterns for structuring a system to be dependable by design. The key idea is to localize the system's most critical requirements into small, reliable parts called trusted bases. We describe two instances of trusted bases: (1) the end-to-end check, which localizes the correctness checking of a computation to end points of a system, and (2) the trusted kernel, which ensures the safety of a set of resources with a small core of a system.Northrop Grumman Cybersecurity Research ConsortiumNational Science Foundation (U.S.) (Deep and Scalable Analysis of Software Grant 0541183)National Science Foundation (U.S.) (CRI: CRD - Development of Alloy Technology and Materials Grant 0707612

    Operating System Support for Redundant Multithreading

    Get PDF
    Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors

    Evaluation of MILS and reduced kernel security concepts for SCADA remote terminal units.

    Get PDF
    The purpose of this project is to study the benefits that the Multiple Independent Levels of Security (MILS) approach can provide to Supervisory Control and Data Acquisition (SCADA) remote terminal units. This is accomplished through a heavy focus on MILS concepts such as resource separation, verification, and kernel minimization and reduction. Two architectures are leveraged to study the application of reduced kernel concepts for a remote terminal unit (RTU). The first is the LynxOS embedded operating system, which is used to create a bootable image of a working RTU. The second is the Pistachio microkernel, the features and development environment of which are analyzed and catalogued to provide the basis for a future RTU. A survey of recent literature is included that focuses on the state of SCADA security, the MILS standard, and microkernel research. The design methodology for a MILS compliant RTU is outlined, including a benefit analysis of applying MILS in an industrial network setting. Also included are analyses of the concepts of MILS which are relevant to the design and how LynxOS and Pistachio can be used to study some of these concepts. A section detailing the prototyping of RTUs on LynxOS and Pistachio is also included, followed by an initial security and performance analysis for both systems

    Uso de riscos na validação de sistemas baseados em componentes

    Get PDF
    Orientadores: Eliane Martins, Henrique Santos do Carmo MadeiraTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A sociedade moderna está cada vez mais dependente dos serviços prestados pelos computadores e, conseqüentemente, dependente do software que está sendo executado para prover estes serviços. Considerando a tendência crescente do desenvolvimento de produtos de software utilizando componentes reutilizáveis, a dependabilidade do software, ou seja, a segurança de que o software irá funcionar adequadamente, recai na dependabilidade dos componentes que são integrados. Os componentes são normalmente adquiridos de terceiros ou produzidos por outras equipes de desenvolvimento. Dessa forma, os critérios utilizados na fase de testes dos componentes dificilmente estão disponíveis. A falta desta informação aliada ao fato de se estar utilizando um componente que não foi produzido para o sistema e o ambiente computacional específico faz com que a reutilização de componentes apresente um risco para o sistema que os integra. Estudos tradicionais do risco de um componente de software definem dois fatores que caracteriza o risco, a probabilidade de existir uma falha no componente e o impacto que isso causa no sistema computacional. Este trabalho propõe o uso da análise do risco para selecionar pontos de injeção e monitoração para campanhas de injeção de falhas. Também propõe uma abordagem experimental para a avaliação do risco de um componente para um sistema. Para se estimar a probabilidade de existir uma falha no componente, métricas de software foram combinadas num modelo estatístico. O impacto da manifestação de uma falha no sistema foi estimado experimentalmente utilizando a injeção de falhas. Considerando esta abordagem, a avaliação do risco se torna genérica e repetível embasando-se em medidas bem definidas. Dessa forma, a metodologia pode ser utilizada como um benchmark de componentes quanto ao risco e pode ser utilizada quando é preciso escolher o melhor componente para um sistema computacional, entre os vários componentes que provêem a mesma funcionalidade. Os resultados obtidos na aplicação desta abordagem em estudos de casos nos permitiram escolher o melhor componente, considerando diversos objetivos e necessidades dos usuáriosAbstract: Today's societies have become increasingly dependent on information services. A corollary is that we have also become increasingly dependent on computer software products that provide such services. The increasing tendency of software development to employ reusable components means that software dependability has become even more reliant on the dependability of integrated components. Components are usually acquired from third parties or developed by unknown development teams. In this way, the criteria employed in the testing phase of components-based systems are hardly ever been available. This lack of information, coupled with the use of components that are not specifically developed for a particular system and computational environment, makes components reutilization risky for the integrating system. Traditional studies on the risk of software components suggest that two aspects must be considered when risk assessment tests are performed, namely the probability of residual fault in software component, and the probability of such fault activation and impact on the computational system. The present work proposes the use of risk analysis to select the injection and monitoring points for fault injection campaigns. It also proposes an experimental approach to evaluate the risk a particular component may represent to a system. In order to determine the probability of a residual fault in the component, software metrics are combined in a statistical mode!. The impact of fault activation is estimated using fault injection. Through this experimental approach, risk evaluation becomes replicable and buttressed on well-defined measurements. In this way, the methodology can be used as a components' risk benchmark, and can be employed when it is necessary to choose the most suitable among several functionally-similar components for a particular computational system. The results obtained in the application of this approach to specific case studies allowed us to choose the best component in each case, without jeopardizing the diverse objectives and needs of their usersDoutoradoDoutor em Ciência da Computaçã

    A first look at RISC-V virtualization from an embedded systems perspective

    Get PDF
    This article describes the first public implementation and evaluation of the latest version of the RISC-V hypervisor extension (H-extension v0.6.1) specification in a Rocket chip core. To perform a meaningful evaluation for modern multi-core embedded and mixedcriticality systems, we have ported Bao, an open-source static partitioning hypervisor, to RISC-V. We have also extended the RISC-V platformlevel interrupt controller (PLIC) to enable direct guest interrupt injection with low and deterministic latency and we have enhanced the timer infrastructure to avoid trap and emulation overheads. Experiments were carried out in FireSim, a cycle-accurate, FPGA-accelerated simulator, and the system was also successfully deployed and tested in a Zynq UltraScale+ MPSoC ZCU104. Our hardware implementation was opensourced and is currently in use by the RISC-V community towards the ratification of the H-extension specification.This work has been supported by FCT - undação para a Ciência e a Tecnologia within the R&D Units Project Scope: UIDB/00319/2020. This work has also been supported by FCT within the PhD Scholarship Project Scope: SFRH/BD/138660/2018

    Vulnerability detection in device drivers

    Get PDF
    Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2017The constant evolution in electronics lets new equipment/devices to be regularly made available on the market, which has led to the situation where common operating systems (OS) include many device drivers(DD) produced by very diverse manufactures. Experience has shown that the development of DD is error prone, as a majority of the OS crashes can be attributed to flaws in their implementation. This thesis addresses the challenge of designing methodologies and tools to facilitate the detection of flaws in DD, contributing to decrease the errors in this kind of software, their impact in the OS stability, and the security threats caused by them. This is especially relevant because it can help developers to improve the quality of drivers during their implementation or when they are integrated into a system. The thesis work started by assessing how DD flaws can impact the correct execution of the Windows OS. The employed approach used a statistical analysis to obtain the list of kernel functions most used by the DD, and then automatically generated synthetic drivers that introduce parameter errors when calling a kernel function, thus mimicking a faulty interaction. The experimental results showed that most targeted functions were ineffective in the defence of the incorrect parameters. A reasonable number of crashes and a small number of hangs were observed suggesting a poor error containment capability of these OS functions. Then, we produced an architecture and a tool that supported the automatic injection of network attacks in mobile equipment (e.g., phone), with the objective of finding security flaws (or vulnerabilities) in Wi-Fi drivers. These DD were selected because they are of easy access to an external adversary, which simply needs to create malicious traffic to exploit them, and therefore the flaws in their implementation could have an important impact. Experiments with the tool uncovered a previously unknown vulnerability that causes OS hangs, when a specific value was assigned to the TIM element in the Beacon frame. The experiments also revealed a potential implementation problem of the TCP-IP stack by the use of disassociation frames when the target device was associated and authenticated with a Wi-Fi access point. Next, we developed a tool capable of registering and instrumenting the interactions between a DD and the OS. The solution used a wrapper DD around the binary of the driver under test, enabling full control over the function calls and parameters involved in the OS-DD interface. This tool can support very diverse testing operations, including the log of system activity and to reverse engineer the driver behaviour. Some experiments were performed with the tool, allowing to record the insights of the behaviour of the interactions between the DD and the OS, the parameter values and return values. Results also showed the ability to identify bugs in drivers, by executing tests based on the knowledge obtained from the driver’s dynamics. Our final contribution is a methodology and framework for the discovery of errors and vulnerabilities in Windows DD by resorting to the execution of the drivers in a fully emulated environment. This approach is capable of testing the drivers without requiring access to the associated hardware or the DD source code, and has a granular control over each machine instruction. Experiments performed with Off the Shelf DD confirmed a high dependency of the correctness of the parameters passed by the OS, identified the precise location and the motive of memory leaks, the existence of dormant and vulnerable code.A constante evolução da eletrónica tem como consequência a disponibilização regular no mercado de novos equipamentos/dispositivos, levando a uma situação em que os sistemas operativos (SO) mais comuns incluem uma grande quantidade de gestores de dispositivos (GD) produzidos por diversos fabricantes. A experiência tem mostrado que o desenvolvimento dos GD é sujeito a erros uma vez que a causa da maioria das paragens do SO pode ser atribuída a falhas na sua implementação. Esta tese centra-se no desafio da criação de metodologias e ferramentas que facilitam a deteção de falhas nos GD, contribuindo para uma diminuição nos erros neste tipo de software, o seu impacto na estabilidade do SO, e as ameaças de segurança por eles causadas. Isto é especialmente relevante porque pode ajudar a melhorar a qualidade dos GD tanto na sua implementação como quando estes são integrados em sistemas. Este trabalho inicia-se com uma avaliação de como as falhas nos GD podem levar a um funcionamento incorreto do SO Windows. A metodologia empregue usa uma análise estatística para obter a lista das funções do SO que são mais utilizadas pelos GD, e posteriormente constrói GD sintéticos que introduzem erros nos parâmetros passados durante a chamada às funções do SO, e desta forma, imita a integração duma falta. Os resultados das experiências mostraram que a maioria das funções testadas não se protege eficazmente dos parâmetros incorretos. Observou-se a ocorrência de um número razoável de paragens e um pequeno número de bloqueios, o que sugere uma pobre capacidade das funções do SO na contenção de erros. Posteriormente, produzimos uma arquitetura e uma ferramenta que suporta a injeção automática de ataques em equipamentos móveis (e.g., telemóveis), com o objetivo de encontrar falhas de segurança (ou vulnerabilidades) em GD de placas de rede Wi-Fi. Estes GD foram selecionados porque são de fácil acesso a um atacante remoto, o qual apenas necessita de criar tráfego malicioso para explorar falhas na sua implementação podendo ter um impacto importante. As experiências realizadas com a ferramenta revelaram uma vulnerabilidade anteriormente desconhecida que provoca um bloqueio no SO quando é atribuído um valor específico ao campo TIM da mensagem de Beacon. As experiências também revelaram um potencial problema na implementação do protocolo TCP-IP no uso das mensagens de desassociação quando o dispositivo alvo estava associado e autenticado com o ponto de acesso Wi-Fi. A seguir, desenvolvemos uma ferramenta com a capacidade de registar e instrumentar as interações entre os GD e o SO. A solução usa um GD que envolve o código binário do GD em teste, permitindo um controlo total sobre as chamadas a funções e aos parâmetros envolvidos na interface SO-GD. Esta ferramenta suporta diversas operações de teste, incluindo o registo da atividade do sistema e compreensão do comportamento do GD. Foram realizadas algumas experiências com esta ferramenta, permitindo o registo das interações entre o GD e o SO, os valores dos parâmetros e os valores de retorno das funções. Os resultados mostraram a capacidade de identificação de erros nos GD, através da execução de testes baseados no conhecimento da dinâmica do GD. A nossa contribuição final é uma metodologia e uma ferramenta para a descoberta de erros e vulnerabilidades em GD Windows recorrendo à execução do GD num ambiente totalmente emulado. Esta abordagem permite testar GD sem a necessidade do respetivo hardware ou o código fonte, e possuí controlo granular sobre a execução de cada instrução máquina. As experiências realizadas com GD disponíveis comercialmente confirmaram a grande dependência que os GD têm nos parâmetros das funções do SO, e identificaram o motivo e a localização precisa de fugas de memória, a existência de código não usado e vulnerável

    Microkernel mechanisms for improving the trustworthiness of commodity hardware

    Full text link
    The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthiness of computer systems based on commercial off-the-shelf (COTS) hardware that can malfunction when the hardware is impacted by transient hardware faults. The hardware anomalies, if undetected, can cause data corruptions, system crashes, and security vulnerabilities, significantly undermining system dependability. Specifically, we adopt the single event upset (SEU) fault model and address transient CPU or memory faults. We take advantage of the functional correctness and isolation guarantee provided by the formally verified seL4 microkernel and hardware redundancy provided by multicore processors, design the redundant co-execution (RCoE) architecture that replicates a whole software system (including the microkernel) onto different CPU cores, and implement two variants, loosely-coupled redundant co-execution (LC-RCoE) and closely-coupled redundant co-execution (CC-RCoE), for the ARM and x86 architectures. RCoE treats each replica of the software system as a state machine and ensures that the replicas start from the same initial state, observe consistent inputs, perform equivalent state transitions, and thus produce consistent outputs during error-free executions. Compared with other software-based error detection approaches, the distinguishing feature of RCoE is that the microkernel and device drivers are also included in redundant co-execution, significantly extending the sphere of replication (SoR). Based on RCoE, we introduce two kernel mechanisms, fingerprint validation and kernel barrier timeout, detecting fault-induced execution divergences between the replicated systems, with the flexibility of tuning the error detection latency and coverage. The kernel error-masking mechanisms built on RCoE enable downgrading from triple modular redundancy (TMR) to dual modular redundancy (DMR) without service interruption. We run synthetic benchmarks and system benchmarks to evaluate the performance overhead of the approach, observe that the overhead varies based on the characteristics of workloads and the variants (LC-RCoE or CC-RCoE), and conclude that the approach is applicable for real-world applications. The effectiveness of the error detection mechanisms is assessed by conducting fault injection campaigns on real hardware, and the results demonstrate compelling improvement
    corecore