7 research outputs found

    RAMpage: Graceful Degradation Management for Memory Errors in Commodity Linux Servers

    Full text link
    Abstract—Memory errors are a major source of reliability problems in current computers. Undetected errors may result in program termination, or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. Often, neither additional circuitry to support hardware-based error detection nor downtime for performing hardware tests can be afforded. In the case of permanent memory errors, a system faces two challenges: detecting errors as early as possible and handling them while avoiding system downtime. To increase system reliability, we have developed RAMpage, an online memory testing infrastructure for commodity x86-64-based Linux servers, which is capable of efficiently detecting memory errors and which provides graceful degradation by withdrawing affected memory pages from further use. We describe the design and implementation of RAMpage and present results of an extensive qualitative as well as quantitative evaluation. Keywords-Fault tolerance, DRAM chips, Operating systems I

    Experimental evaluation of distributed middleware with a virtualized java environment

    Get PDF
    The correctness and performance of large scale service oriented systems depend on distributed middleware components performing various communication and coordination functions. It is, however, very difficult to experimentally assess such middleware components, as interesting behavior often arises exclusively in large scale settings, but such deployments are costly and time consuming. We address this challenge with MINHA, a system that virtualizes multiple JVM instances within a single JVM while simulating key environment components, thus reproducing the concurrency, distribution, and performance characteristics of the actual system. The usefulness of MINHA is demonstrated by applying it to the WS4D Java stack, a popular implementation of the Devices Profile for Web Services (DPWS) specification.(undefined

    Uma ferramenta para modelagem e simulação de computação aproximada em hardware

    Get PDF
    Orientador: Lucas Francisco WannerDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Pesquisas recentes têm introduzido unidades de hardware que produzem resultados incorretos de maneira determinística ou probabilística para um pequeno conjunto de entradas. Por outro lado, permitem um maior desempenho ou um consumo de energia significativamente menor em comparação com versões precisas das mesmas unidades. Como integrar, validar e avaliar essas alternativas em uma arquitetura ou processador, porém, permanece um desafio. A falta de ferramentas para representar e avaliar hardware aproximado leva desenvolvedores a verificar suas soluções de maneira independente, sem considerar interações com outros componentes, exigindo um grande esforço em modelagem e simulação. Neste trabalho, introduzimos ADeLe, uma linguagem de alto nível para descrever, configurar e integrar unidades de hardware aproximado em um processador. ADeLe reduz o esforço de desenvolvimento de hardware aproximado por modelar aproximações em um alto nível de abstração e injetá-las automaticamente em um modelo de processador para simulação arquitetural. Na ferramenta relacionada a ADeLe, aproximações podem modificar ou substituir completamente o comportamento de instruções de hardware através de políticas definidas pelo usuário. As instruções podem ser modificadas deterministicamente ou probabilisticamente (por exemplo, baseado em tensão de operação e frequência). Para proporcionar um ambiente de teste controlado, as aproximações podem ser ligadas e desligadas a partir do software em execução. O consumo de energia é automaticamente computado com base em modelos customizáveis no sistema. Assim, a ferramenta proporciona um método de verificação genérico e flexível, permitindo uma fácil avaliação da troca entre energia e qualidade de aplicações sujeitadas a hardware aproximado. Demonstramos a ferramenta pela introdução de variadas técnicas de aproximação em um modelo de processador, com o qual aplicações selecionadas foram executadas. Ao modelar módulos de hardware aproximado dedicados, mostramos como ADeLe representa unidades aritméticas aproximadas e unidades funcionais de precisão reduzida executando 4 aplicações de processamento de imagens e 2 de computação de ponto flutuante. Com outro método de aproximação, também mostramos como a ferramenta é utilizada para estudar o impacto de memórias alimentadas por tensão ajustável sobre 9 aplicações. Nossos experimentos demonstram as capacidades da ferramenta e como ela pode ser utilizada para gerar processadores virtuais aproximados e avaliar o equilíbrio entre energia e qualidade para diferentes aplicações com esforço reduzidoAbstract: Recent research has introduced approximate hardware units that produce incorrect outputs deterministically or probabilistically for some small subset of inputs. On the other hand, they allow significantly higher throughput or lower power than their error-free counterparts. The integration, validation, and evaluation of these approximate units in architectures and processors, however, remains challenging. The lack of tools to represent and evaluate approximate hardware leads designers to verify their solutions independently, not considering interactions with other components, demanding high-effort modeling and simulation. In this work, we introduce ADeLe, a high-level language for the description, configuration, and integration of approximate hardware units into processors. ADeLe reduces the design effort for approximate hardware by modeling approximations at a high level of abstraction and automatically injecting them into a processor model for architectural simulation. In the ADeLe framework, approximations may modify or completely replace the functional behavior of instructions according to user-defined policies. Instructions may be approximated deterministically or probabilistically (e.g., based on operating voltage and frequency). To allow for controlled testing, approximations may be enabled and disabled from software. Energy is automatically accounted for based on customizable models that consider the potential power savings of the approximations that are enabled in the system. Thus, the framework provides a generic and flexible verification method, allowing for easy evaluation of the energy-quality trade-off of applications subjected to approximate hardware. We demonstrate the framework by introducing different approximation techniques into a processor model, on top of which we run selected applications. Modeling dedicated hardware modules, we show how ADeLe can represent approximate arithmetic and reduced precision computation units executing 4 image processing and 2 floating point applications. Using a different method of approximation, we also show how the framework is used to study the impact of voltage-overscaled memories over 9 applications. Our experiments show the framework capabilities and how it may be used to generate approximate virtual CPUs and to evaluate energy-quality trade-offs for different applications with reduced effortMestradoCiência da ComputaçãoMestre em Ciência da Computação2017/08015-8  FAPES

    Simulation-based Fault Injection with QEMU for Speeding-up Dependability Analysis of Embedded Software

    Get PDF
    Simulation-based fault injection (SFI) represents a valuable solu- tion for early analysis of software dependability and fault tolerance properties before the physical prototype of the target platform is available. Some SFI approaches base the fault injection strategy on cycle-accurate models imple- mented by means of Hardware Description Languages (HDLs). However, cycle- accurate simulation has revealed to be too time-consuming when the objective is to emulate the effect of soft errors on complex microprocessors. To overcome this issue, SFI solutions based on virtual prototypes of the target platform has started to be proposed. However, current approaches still present some draw- backs, like, for example, they work only for specific CPU architectures, or they require code instrumentation, or they have a different target (i.e., design errors instead of dependability analysis). To address these disadvantages, this paper presents an efficient fault injection approach based on QEMU, one of the most efficient and popular instruction-accurate emulator for several microprocessor architectures. As main goal, the proposed approach represents a non intrusive technique for simulating hardware faults affecting CPU behaviours. Perma- nent and transient/intermittent hardware fault models have been abstracted without losing quality for software dependability analysis. The approach mini- mizes the impact of the fault injection procedure in the emulator performance by preserving the original dynamic binary translation mechanism of QEMU. Experimental results for both x86 and ARM processors proving the efficiency and effectiveness of the proposed approach are presented

    Microkernel mechanisms for improving the trustworthiness of commodity hardware

    Full text link
    The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthiness of computer systems based on commercial off-the-shelf (COTS) hardware that can malfunction when the hardware is impacted by transient hardware faults. The hardware anomalies, if undetected, can cause data corruptions, system crashes, and security vulnerabilities, significantly undermining system dependability. Specifically, we adopt the single event upset (SEU) fault model and address transient CPU or memory faults. We take advantage of the functional correctness and isolation guarantee provided by the formally verified seL4 microkernel and hardware redundancy provided by multicore processors, design the redundant co-execution (RCoE) architecture that replicates a whole software system (including the microkernel) onto different CPU cores, and implement two variants, loosely-coupled redundant co-execution (LC-RCoE) and closely-coupled redundant co-execution (CC-RCoE), for the ARM and x86 architectures. RCoE treats each replica of the software system as a state machine and ensures that the replicas start from the same initial state, observe consistent inputs, perform equivalent state transitions, and thus produce consistent outputs during error-free executions. Compared with other software-based error detection approaches, the distinguishing feature of RCoE is that the microkernel and device drivers are also included in redundant co-execution, significantly extending the sphere of replication (SoR). Based on RCoE, we introduce two kernel mechanisms, fingerprint validation and kernel barrier timeout, detecting fault-induced execution divergences between the replicated systems, with the flexibility of tuning the error detection latency and coverage. The kernel error-masking mechanisms built on RCoE enable downgrading from triple modular redundancy (TMR) to dual modular redundancy (DMR) without service interruption. We run synthetic benchmarks and system benchmarks to evaluate the performance overhead of the approach, observe that the overhead varies based on the characteristics of workloads and the variants (LC-RCoE or CC-RCoE), and conclude that the approach is applicable for real-world applications. The effectiveness of the error detection mechanisms is assessed by conducting fault injection campaigns on real hardware, and the results demonstrate compelling improvement

    MINHA : avalição realista de aplicações distribuídas num ambiente centralizado

    Get PDF
    Dissertação de mestrado em Engenharia de InformáticaNos últimos anos os sistemas distribuídos têm sofrido um crescimento exponencial. Estes sistemas, normalmente implementados na plataforma Java, são compostos por um vasto conjunto de componentes de middleware, os quais desempenham várias tarefas de comunicação e de coordenação. Esta tendência influencia a modelação e a arquitetura de novas aplicações cada vez mais complexas obrigando a um enorme esforço e a um custo elevado na avaliação do seu desempenho. A concorrência e a sua distribuição, bem como o facto de muitos problemas só se manifestarem pela grande escala em si, não permite que a sua avaliação seja feita com recurso a simples ferramentas que não tenham em conta estas características. Avaliação realista e controlada de aplicações distribuídas é ainda hoje muito difícil de alcançar, especialmente em cenários de larga escala. Modelos de simulação pura podem ser uma solução para este problema, mas criar modelos abstratos a partir de implementações reais nem sempre é possível ou mesmo desejável, sobretudo na fase de desenvolvimento na qual ainda podem não existir todos os componentes ou a sua funcionalidade estar incompleta. Para colmatar esta falha, nesta dissertação é apresentada o Minha, uma plataforma que permite uma avaliação realista das aplicações através da combinação de modelos abstratos de simulação e implementações reais num ambiente centralizado. Esta plataforma combina a execução de código real sob análise, com modelos de simulação do ambiente envolvente, isto é, da rede e da aplicação. Este sistema permite reproduzir as condições de um sistema em grande escala e através da manipulação de bytecode Java, suporta componentes de middleware inalterados. A utilidade deste sistema é demonstrada aplicando-o ao WS4D, uma pilha que cumpre a especificação Device Profile for Web Services.In recent years, distributed systems have been su ering an exponential growth. These systems, typically implemented in Java platform, are composed by a wide range of components of middleware, which perform several communication and coordination tasks. This trend influences the modelling and the architecture of the newest applications, which are increasing complexity and requiring an large e ort with high costs on the evaluation of their performance. Concurrency and distribution, as well as the fact that many problems manifest only in large scale, would not allow doing an evaluation using simple tools which do not take in account these features. The realistic evaluation of distributed applications is still a di cult task, particularly for large scale scenarios. The use of simulation models can be a solution for this problem, but their creation based on real implementations can sometimes be impossible or undesirable, as the system can be incomplete and non functional. This problem can be solved with the Minha platform, that allows a realistic evaluation of applications trough the combination of abstract models and the simulation of real implementations in a centralized environment. The main goal of this dissertation is the creation of a network model to be used by the Minha platform. This model introduces new variables in the evaluation such as the needed time for message exchange, resulting in more accurate results. Furthermore, it is presented a calibration method that improves the faithfulness of the model to the real environment. This allows the reproduction of a large scale system and through java bytecode manipulation it allows the usage of pre-existent middleware components. The usefulness of this system is demonstrated by applying it to WS4D, a stack that complies with the Device Profile for Web Services specification

    Zuverlässigkeitsbewertung von vernetzten eingebetteten Systemen mittels Fehlereffektsimulation

    Get PDF
    Die Bedeutsamkeit von eingebetteten Systemen steigt kontinuierlich. Dies zeigt sich bereits anhand ihrer hohen Verbreitung. Neben der reinen Anzahl steigt zusätzlich die Komplexität der einzelnen Systeme. Dies resultiert nicht nur in einem steigenden Entwurfsaufwand, sondern betrifft zusätzlich den Analyseaufwand. Hierbei ist zu beachten, dass die Systeme vermehrt sicherheitsrelevante Aufgaben übernehmen. Ein anschauliches Beispiel stellen Systeme zur Fahrerassistenz bzw. Fahrzeugautomatisierung dar. Durch den rasanten Fortschritt in den letzten Jahren, wird erwartet, dass diese Systeme in den nächsten Jahren bereits hochautomatisiertes Fahren ermöglichen. Für solche Systeme bedeutet ein Ausfall bzw. falsch erbrachter Dienst schwerwiegende Folgen für die Umwelt und Personen im Umfeld. Eine Sicherheitsbewertung ist zwingend vorgeschrieben. Die hohe Vernetzung der einzelnen Systeme bedingt, dass eine isolierte Betrachtung nicht mehr ausreichend ist. Deshalb muss die Analyse neben der gestiegenen Komplexität der einzelnen Systeme zusätzlich die Interaktionen mit weiteren Systemen beachten. Aktuelle Standards empfehlen zur Sicherheitsbewertung häufig Verfahren wie Brainstorming, Fehlermöglichkeits- und Fehlereinflussanalysen oder Fehlerbaumanalysen. Der Erfolg dieser Verfahren ist meist sehr stark von den beteiligten Personen geprägt und fordert ein umfassendes Systemwissen. Die beteiligten Personen müssen die zuvor beschriebene erhöhte Komplexität und Vernetzung beachten und analysieren. Diese Arbeit stellt einen Ansatz zur Unterstützung der Sicherheitsbewertung vor. Ziel ist, das benötigte Systemwissen von den beteiligten Personen, auf ein Simulationsmodell zu übertragen. Der Anwender ermittelt anhand des Simulationsmodells die systemweiten Fehlereffekte. Die Analyse der Fehlerpropagierung bildet die Grundlage der traditionellen Sicherheitsanalysen. Da das Simulationsmodell die Systemkomplexität und die Systemabhängigkeiten beinhaltet, reduzieren sich die Anforderungen an die beteiligten Personen und folglich der Analyseaufwand. Um solch ein Vorgehen zu ermöglichen, wird eine Methode zur Fehlerinjektion in Simulationsmodelle vorgestellt. Hierbei ist vor allem die Unterstützung unterschiedlicher Abstraktionsgrade, insbesondere von sehr abstrakten System-Level-Modellen, wichtig. Des Weiteren wird ein Ansatz zur umfassenden Fehlerspezifikation vorgestellt. Der Ansatz ermöglicht die Spezifikation von Fehlerursachen auf unterschiedlichen Abstraktionsebenen sowie die automatisierte Einbringung der Fehler in die Simulation. Neben der Einbringung der Fehler bildet die Beobachtung und Analyse der Fehlereffekte weitere wichtige Aspekte. Eine modellbasierte Spezifikation rundet den Ansatz ab und vereinfacht die Integration in einen modellgetriebenen Entwurf
    corecore