10 research outputs found

    Mitigation of failures in high performance computing via runtime techniques

    Get PDF
    As machines increase in scale, it is predicted that failure rates of supercomputers will correspondingly increase. Even though the mean time to failure (MTTF) of individual component is high, the large number of components significantly decreases the system MTTF. Meanwhile, the decreasing size of transistors has been critical to the increase in capacity of supercomputers. The smaller the transistors are, silent data corruptions (SDC) are likely to occur more frequently. SDCs do not inhibit execution, but may silently lead to incorrect results. In this thesis, we leverage runtime system and compiler techniques to mitigate a significant fraction of failures automatically with low overhead. The main goals of various system-level fault tolerance strategies designed in this thesis are: reducing the extra cost added to application execution while improving system reliability; automatically adjusting fault tolerance decisions without user intervention based on environmental changes; protecting applications not only from fail-stop failures but also from silent data corruptions. The main contributions of this thesis are development of a semi-blocking checkpoint protocol that overlaps application execution with fault tolerance operation to reduce the overhead of checkpointing, a runtime system technique for automatic checkpoint and restart without user intervention, a holistic framework (ACR) for automatically detecting and recovering from silent data corruptions and a framework called FlipBack that provides targeted protection against silent data corruption with low cost

    ADDING PERSISTENCE TO MAIN MEMORY PROGRAMMING

    Get PDF
    Unlocking the true potential of the new persistent memories (PMEMs) requires eliminating traditional persistent I/O abstractions altogether, by introducing persistent semantics directly into main memory programming. Such a programming model elevates failure atomicity to a first-class application property in addition to in-memory data layout, concurrency-control, and fault tolerance, and therefore requires redesign of programming abstractions for both program correctness and maximum performance gains. To address these challenges, this thesis proposes a set of system software designs that integrate persistence with main memory programming, and makes the following contributions. First, this thesis proposes a PMEM-aware I/O runtime, NVStream, that supports fast durable streaming I/O. NVStream uses a memory-based I/O interface that integrates with existing I/O data movement operations of an application to accelerate persistent data writes. NVStream carefully designs its persistent data storage layout and crash-consistent semantics to match both application and PMEM characteristics. Specifically, we leverage the streaming nature of I/O in HPC workflows, to benefit from using a log-structured PMEM storage engine design, that uses relaxed write orderings and append-only failure-atomic semantics to form strongly consistent application checkpoints. Furthermore, we identify that optimizing the I/O software stack exposes the PMEM bandwidth limitations as a bottleneck during parallel HPC I/O writes, and propose a novel data movement design – PHX. PHX uses alternative network data movement paths available in datacenters to ease up the bandwidth pressure on the PMEM memory interconnects, all while maintaining the correctness of the persistent data. Next, the thesis explores the challenges and opportunities of using PMEM for true main memory persistent programming – a single data domain for both runtime and persistent applicationstate. Such a programming model includes maintaining ACID properties during each and every update to applications persistent structures. ACID-qualified persistent programming for multi-threaded applications is hard, as the programmer has to reason about both crash-consistency and synchronization – crash-sync – semantics for programming correctness. The thesis contributes new understanding of the correctness requirements for mixing different crash-consistent and synchronization protocols, characterizes the performance of different crash-sync realizations for different applications and hardware architectures, and draws actionable insights for future designs of PMEM systems. Finally, the application state stored on node-local persistent memory is still vulnerable to catastrophic node failures. The thesis proposes a replicated persistent memory runtime, Blizzard, that supports truly fault tolerant, concurrent and persistent data-structure programming. Blizzard carefully integrates userspace networking with byte addressable PMEM for a fast, persistent memory replication runtime. The design also incorporates a replication-aware crash-sync protocol that supports consistent and concurrent updates on persistent data-structures. Blizzard offers applications the flexibility to use the data structures that best match their functional requirements, while offering better performance, and providing crucial reliability guarantees lacking from existing persistent memory runtimes.Ph.D

    Dynamic shared memory architecture, systems, and optimizations for high performance and secure virtualized cloud

    Get PDF
    Dynamic memory consolidation is an important enabler for high performance virtual machine (VM) execution in virtualized Cloud. Efficient just-in-time memory balancing requires three core capabilities: (i) Detecting memory pressure across VMs hosted on a physical machine; (ii) Allocation of memory to respective VMs; (iii) Enabling fast recovery upon making newly allocated memory available at the high pressure VMs. Although the Balloon driver technology facilitates the second task, it remains difficult to accurately predict the VM memory demands at affordable overhead, especially under unpredictable and changing workloads. Furthermore, no prior study analyzed the effect of slow response of VM execution to the newly available memory due to paging based application recovery. In this dissertation research, I have made four original contributions to dynamic shared memory management in terms of architecture, systems and optimizations for improving VM execution performance and security. First, we designed and developed MemPipe, a shared memory inter-VM communication channel for fast inter-VM network I/O. MemPipe increases the shared memory utilization by adaptively adjusting the shared memory size according to workloads demands. It also reduces the inter-VM network communication overhead by directly copying the packets from the sender VM's user space to the shared memory area. Second, we developed iBalloon, a light-weight and transparent prediction based facility to enable automated or semi-automated ballooning with more customizable, accurate, and efficient memory balancing policies among VMs. Third, we developed MemFlex, a novel shared memory swapping facility that can effectively utilizes host idle memory by a hybrid memory swap-out model and a fast swap-in optimization. Fourth, we introduced SecureStack, which is a kernel backed tool to prevent the sensitive data on the function stack from being illegally accessed by the untrusted functions. SecureStack introduces three procedures to protect, restore, and clear the stack in a reliable and low cost manner. It is highly transparent to the users and does not bring any new vulnerability to the existing system. The above research developments are packaged into MemLego, a new memory management framework for memory-centric computing in the big data era.Ph.D

    Gestão e engenharia de CAP na nuvem híbrida

    Get PDF
    Doutoramento em InformáticaThe evolution and maturation of Cloud Computing created an opportunity for the emergence of new Cloud applications. High-performance Computing, a complex problem solving class, arises as a new business consumer by taking advantage of the Cloud premises and leaving the expensive datacenter management and difficult grid development. Standing on an advanced maturing phase, today’s Cloud discarded many of its drawbacks, becoming more and more efficient and widespread. Performance enhancements, prices drops due to massification and customizable services on demand triggered an emphasized attention from other markets. HPC, regardless of being a very well established field, traditionally has a narrow frontier concerning its deployment and runs on dedicated datacenters or large grid computing. The problem with common placement is mainly the initial cost and the inability to fully use resources which not all research labs can afford. The main objective of this work was to investigate new technical solutions to allow the deployment of HPC applications on the Cloud, with particular emphasis on the private on-premise resources – the lower end of the chain which reduces costs. The work includes many experiments and analysis to identify obstacles and technology limitations. The feasibility of the objective was tested with new modeling, architecture and several applications migration. The final application integrates a simplified incorporation of both public and private Cloud resources, as well as HPC applications scheduling, deployment and management. It uses a well-defined user role strategy, based on federated authentication and a seamless procedure to daily usage with balanced low cost and performance.O desenvolvimento e maturação da Computação em Nuvem abriu a janela de oportunidade para o surgimento de novas aplicações na Nuvem. A Computação de Alta Performance, uma classe dedicada à resolução de problemas complexos, surge como um novo consumidor no Mercado ao aproveitar as vantagens inerentes à Nuvem e deixando o dispendioso centro de computação tradicional e o difícil desenvolvimento em grelha. Situando-se num avançado estado de maturação, a Nuvem de hoje deixou para trás muitas das suas limitações, tornando-se cada vez mais eficiente e disseminada. Melhoramentos de performance, baixa de preços devido à massificação e serviços personalizados a pedido despoletaram uma atenção inusitada de outros mercados. A CAP, independentemente de ser uma área extremamente bem estabelecida, tradicionalmente tem uma fronteira estreita em relação à sua implementação. É executada em centros de computação dedicados ou computação em grelha de larga escala. O maior problema com o tipo de instalação habitual é o custo inicial e o não aproveitamento dos recursos a tempo inteiro, fator que nem todos os laboratórios de investigação conseguem suportar. O objetivo principal deste trabalho foi investigar novas soluções técnicas para permitir o lançamento de aplicações CAP na Nuvem, com particular ênfase nos recursos privados existentes, a parte peculiar e final da cadeia onde se pode reduzir custos. O trabalho inclui várias experiências e análises para identificar obstáculos e limitações tecnológicas. A viabilidade e praticabilidade do objetivo foi testada com inovação em modelos, arquitetura e migração de várias aplicações. A aplicação final integra uma agregação de recursos de Nuvens, públicas e privadas, assim como escalonamento, lançamento e gestão de aplicações CAP. É usada uma estratégia de perfil de utilizador baseada em autenticação federada, assim como procedimentos transparentes para a utilização diária com um equilibrado custo e performance

    Co-designing reliability and performance for datacenter memory

    Get PDF
    Memory is one of the key components that affects reliability and performance of datacenter servers. Memory in today’s servers is organized and shared in several ways to provide the most performant and efficient access to data. For example, cache hierarchy in multi-core chips to reduce access latency, non-uniform memory access (NUMA) in multi-socket servers to improve scalability, disaggregation to increase memory capacity. In all these organizations, hardware coherence protocols are used to maintain memory consistency of this shared memory and implicitly move data to the requesting cores. This thesis aims to provide fault-tolerance against newer models of failure in the organization of memory in datacenter servers. While designing for improved reliability, this thesis explores solutions that can also enhance performance of applications. The solutions build over modern coherence protocols to achieve these properties. First, we observe that DRAM memory system failure rates have increased, demanding stronger forms of memory reliability. To combat this, the thesis proposes Dvé, a hardware driven replication mechanism where data blocks are replicated across two different memory controllers in a cache-coherent NUMA system. Data blocks are accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Dvé’s organization offers two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory and (b) higher performance by providing another nearer point of memory access. Dvé’s coherent replication keeps the replicas in sync for reliability and also provides coherent access to read replicas during fault-free operation for improved performance. Dvé can flexibly provide these benefits on-demand at runtime. Next, we observe that the coherence protocol itself requires to be hardened against failures. Memory in datacenter servers is being disaggregated from the compute servers into dedicated memory servers, driven by standards like CXL. CXL specifies the coherence protocol semantics for compute servers to access and cache data from a shared region in the disaggregated memory. However, the CXL specification lacks the requisite level of fault-tolerance necessary to operate at an inter-server scale within the datacenter. Compute servers can fail or be unresponsive in the datacenter and therefore, it is important that the coherence protocol remain available in the presence of such failures. The thesis proposes Āpta, a CXL-based, shared disaggregated memory system for keeping the cached data consistent without compromising availability in the face of compute server failures. Āpta architects a high-performance fault-tolerant object-granular memory server that significantly improves performance for stateless function-as-a-service (FaaS) datacenter applications

    A flight software development and simulation framework for advanced space systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2002.Includes bibliographical references (p. 293-302).Distributed terrestrial computer systems employ middleware software to provide communications abstractions and reduce software interface complexity. Embedded applications are adopting the same approaches, but must make provisions to ensure that hard real-time temporal performance can be maintained. This thesis presents the development and validation of a middleware system tailored to spacecraft flight software development. Our middleware runs on the Generalized Flight Operations Processing Simulator (GFLOPS) and is called the GFLOPS Rapid Real-time Development Environment (GRRDE). GRRDE provides publish-subscribe communication services between software components. These services help to reduce the complexity of managing software interfaces. The hard real-time performance of these services has been verified with General Timed Automata modelling and extensive run-time testing. Several example applications illustrate the use of GRRDE to support advanced flight software development. Two technology-focused studies examine automatic code generation and autonomous fault protection within the GRRDE framework. A complex simulation of the TechSat 21 distributed spacebased radar mission highlights the utility of the approach for large-scale applications.by John Patrick Enright.Ph.D

    Fault-tolerant satellite computing with modern semiconductors

    Get PDF
    Miniaturized satellites enable a variety space missions which were in the past infeasible, impractical or uneconomical with traditionally-designed heavier spacecraft. Especially CubeSats can be launched and manufactured rapidly at low cost from commercial components, even in academic environments. However, due to their low reliability and brief lifetime, they are usually not considered suitable for life- and safety-critical services, complex multi-phased solar-system-exploration missions, and missions with a longer duration. Commercial electronics are key to satellite miniaturization, but also responsible for their low reliability: Until 2019, there existed no reliable or fault-tolerant computer architectures suitable for very small satellites. To overcome this deficit, a novel on-board-computer architecture is described in this thesis.Robustness is assured without resorting to radiation hardening, but through software measures implemented within a robust-by-design multiprocessor-system-on-chip. This fault-tolerant architecture is component-wise simple and can dynamically adapt to changing performance requirements throughout a mission. It can support graceful aging by exploiting FPGA-reconfiguration and mixed-criticality.  Experimentally, we achieve 1.94W power consumption at 300Mhz with a Xilinx Kintex Ultrascale+ proof-of-concept, which is well within the powerbudget range of current 2U CubeSats. To our knowledge, this is the first COTS-based, reproducible on-board-computer architecture that can offer strong fault coverage even for small CubeSats.European Space AgencyComputer Systems, Imagery and Medi

    Hyperscale Data Processing With Network-Centric Designs

    Get PDF
    Today’s largest data processing workloads are hosted in cloud data centers. Due to unprecedented data growth and the end of Moore’s Law, these workloads have ballooned to the hyperscale level, encompassing billions to trillions of data items and hundreds to thousands of machines per query. Enabling and expanding with these workloads are highly scalable data center networks that connect up to hundreds of thousands of networked servers. These massive scales fundamentally challenge the designs of both data processing systems and data center networks, and the classic layered designs are no longer sustainable. Rather than optimize these massive layers in silos, we build systems across them with principled network-centric designs. In current networks, we redesign data processing systems with network-awareness to minimize the cost of moving data in the network. In future networks, we propose new interfaces and services that the cloud infrastructure offers to applications and codesign data processing systems to achieve optimal query processing performance. To transform the network to future designs, we facilitate network innovation at scale. This dissertation presents a line of systems work that covers all three directions. It first discusses GraphRex, a network-aware system that combines classic database and systems techniques to push the performance of massive graph queries in current data centers. It then introduces data processing in disaggregated data centers, a promising new cloud proposal. It details TELEPORT, a compute pushdown feature that eliminates data processing performance bottlenecks in disaggregated data centers, and Redy, which provides high-performance caches using remote disaggregated memory. Finally, it presents MimicNet, a fine-grained simulation framework that evaluates network proposals at datacenter scale with machine learning approximation. These systems demonstrate that our ideas in network-centric designs achieve orders of magnitude higher efficiency compared to the state of the art at hyperscale
    corecore