269 research outputs found

    TANDEM: taming failures in next-generation datacenters with emerging memory

    Get PDF
    The explosive growth of online services, leading to unforeseen scales, has made modern datacenters highly prone to failures. Taming these failures hinges on fast and correct recovery, minimizing service interruptions. Applications, owing to recovery, entail additional measures to maintain a recoverable state of data and computation logic during their failure-free execution. However, these precautionary measures have severe implications on performance, correctness, and programmability, making recovery incredibly challenging to realize in practice. Emerging memory, particularly non-volatile memory (NVM) and disaggregated memory (DM), offers a promising opportunity to achieve fast recovery with maximum performance. However, incorporating these technologies into datacenter architecture presents significant challenges; Their distinct architectural attributes, differing significantly from traditional memory devices, introduce new semantic challenges for implementing recovery, complicating correctness and programmability. Can emerging memory enable fast, performant, and correct recovery in the datacenter? This thesis aims to answer this question while addressing the associated challenges. When architecting datacenters with emerging memory, system architects face four key challenges: (1) how to guarantee correct semantics; (2) how to efficiently enforce correctness with optimal performance; (3) how to validate end-to-end correctness including recovery; and (4) how to preserve programmer productivity (Programmability). This thesis aims to address these challenges through the following approaches: (a) defining precise consistency models that formally specify correct end-to-end semantics in the presence of failures (consistency models also play a crucial role in programmability); (b) developing new low-level mechanisms to efficiently enforce the prescribed models given the capabilities of emerging memory; and (c) creating robust testing frameworks to validate end-to-end correctness and recovery. We start our exploration with non-volatile memory (NVM), which offers fast persistence capabilities directly accessible through the processor’s load-store (memory) interface. Notably, these capabilities can be leveraged to enable fast recovery for Log-Free Data Structures (LFDs) while maximizing performance. However, due to the complexity of modern cache hierarchies, data hardly persist in any specific order, jeop- ardizing recovery and correctness. Therefore, recovery needs primitives that explicitly control the order of updates to NVM (known as persistency models). We outline the precise specification of a novel persistency model – Release Persistency (RP) – that provides a consistency guarantee for LFDs on what remains in non-volatile memory upon failure. To efficiently enforce RP, we propose a novel microarchitecture mechanism, lazy release persistence (LRP). Using standard LFDs benchmarks, we show that LRP achieves fast recovery while incurring minimal overhead on performance. We continue our discussion with memory disaggregation which decouples memory from traditional monolithic servers, offering a promising pathway for achieving very high availability in replicated in-memory data stores. Achieving such availability hinges on transaction protocols that can efficiently handle recovery in this setting, where compute and memory are independent. However, there is a challenge: disaggregated memory (DM) fails to work with RPC-style protocols, mandating one-sided transaction protocols. Exacerbating the problem, one-sided transactions expose critical low-level ordering to architects, posing a threat to correctness. We present a highly available transaction protocol, Pandora, that is specifically designed to achieve fast recovery in disaggregated key-value stores (DKVSes). Pandora is the first one-sided transactional protocol that ensures correct, non-blocking, and fast recovery in DKVS. Our experimental implementation artifacts demonstrate that Pandora achieves fast recovery and high availability while causing minimal disruption to services. Finally, we introduce a novel target litmus-testing framework – DART – to validate the end-to-end correctness of transactional protocols with recovery. Using DART’s target testing capabilities, we have found several critical bugs in Pandora, highlighting the need for robust end-to-end testing methods in the design loop to iteratively fix correctness bugs. Crucially, DART is lightweight and black-box, thereby eliminating any intervention from the programmers

    Statistical Anomaly Discovery Through Visualization

    Get PDF
    Developing a deep understanding of data is a crucial part of decision-making processes. It often takes substantial time and effort to develop a solid understanding to make well-informed decisions. Data analysts often perform statistical analyses through visualization to develop such understanding. However, applicable insight can be difficult due to biases and anomalies in data. An often overlooked phenomenon is mix effects, in which subgroups of data exhibit patterns opposite to the data as a whole. This phenomenon is widespread and often leads inexperienced analysts to draw contradictory conclusions. Discovering such anomalies in data becomes challenging as data continue to grow in volume, dimensionality, and cardinality. Effectively designed data visualizations empower data analysts to reveal and understand patterns in data for studying such paradoxical anomalies. This research explores several approaches for combining statistical analysis and visualization to discover and examine anomalies in multidimensional data. It starts with an automatic anomaly detection method based on correlation comparison and experiments to determine the running time and complexity of the algorithm. Subsequently, the research investigates the design, development, and implementation of a series of visualization techniques to fulfill the needs of analysis through a variety of statistical methods. We create an interactive visual analysis system, Wiggum, for revealing various forms of mix effects. A user study to evaluate Wiggum strengthens understanding of the factors that contribute to the comprehension of statistical concepts. Furthermore, a conceptual model, visual correspondence, is presented to study how users can determine the identity of items between visual representations by interpreting the relationships between their respective visual encodings. It is practical to build visualizations with highly linked views informed by visual correspondence theory. We present a hybrid tree visualization technique, PatternTree, which applies the visual correspondence theory. PatternTree supports users to more readily discover statistical anomalies and explore their relationships. Overall, this dissertation contributes a merging of new visualization theory and designs for analysis of statistical anomalies, thereby leading the way to the creation of effective visualizations for statistical analysis

    Lessons from Formally Verified Deployed Software Systems (Extended version)

    Full text link
    The technology of formal software verification has made spectacular advances, but how much does it actually benefit the development of practical software? Considerable disagreement remains about the practicality of building systems with mechanically-checked proofs of correctness. Is this prospect confined to a few expensive, life-critical projects, or can the idea be applied to a wide segment of the software industry? To help answer this question, the present survey examines a range of projects, in various application areas, that have produced formally verified systems and deployed them for actual use. It considers the technologies used, the form of verification applied, the results obtained, and the lessons that can be drawn for the software industry at large and its ability to benefit from formal verification techniques and tools. Note: a short version of this paper is also available, covering in detail only a subset of the considered systems. The present version is intended for full reference.Comment: arXiv admin note: text overlap with arXiv:1211.6186 by other author

    On The Use of Over-Approximate Analysis in Support of Software Development and Testing

    Get PDF
    The effectiveness of dynamic program analyses, such as profiling and memory-leak detection, crucially depend on the quality of the test inputs. However, adequate sets of inputs are rarely available. Existing automated input generation techniques can help but tend to be either too expensive or ineffective. For example, traditional symbolic execution scales poorly to real-world programs and random input generation may never reach deep states within the program. For scalable, effective, automated input generation that can better support dynamic analysis, I propose an approach that extends traditional symbolic execution by targeting increasingly small fragments of a program. The approach starts by generating inputs for the whole program and progressively introduces additional unconstrained state until it reaches a given program coverage objective. This approach is applicable to any client dynamic analysis requiring high coverage that is also tolerant of over-approximated program behavior--behavior that cannot occur on a complete execution. To assess the effectiveness of my approach, I applied it to two client techniques. The first technique infers the actual path taken by a program execution by observing the CPU's electromagnetic emanations and requires inputs to generate a model that can recognize executed path segments. The client inference works by piece wise matching the observed emanation waveform to those recorded in a model. It requires the model to be complete (i.e. contain every piece) and the waveforms are sufficiently distinct that the inclusion of extra samples is unlikely to cause a misinference. After applying my approach to generate inputs covering all subsegments of the program’s execution paths, I designed a source generator to automatically construct a harness and scaffolding to replay these inputs against fragments of the original program. The inference client constructs the model by recording the harness execution. The second technique performs automated regression testing by identifying behavioral differences between two program versions and requires inputs to perform differential testing. It explores local behavior in a neighborhood of the program changes by generating inputs to functions near (as measured by call-graph) to the modified code. The inputs are then concretely executed on both versions, periodically checking internal state for behavioral differences. The technique requires high coverage inputs for a full examination, and tolerates infeasible local state since both versions likely execute it equivalently. I will then present a separate technique to improve the coverage obtained by symbolic execution of floating-point programs. This technique is equally applicable to both traditional symbolic execution and my progressively under-constrained symbolic execution. Its key idea is to approximate floating-point expressions with fixed-point analogs. In concluding, I will also discuss future research directions, including additional empirical evaluations and the investigation of additional client analyses that could benefit from my approach.Ph.D

    Towards Scalable OLTP Over Fast Networks

    Get PDF
    Online Transaction Processing (OLTP) underpins real-time data processing in many mission-critical applications, from banking to e-commerce. These applications typically issue short-duration, latency-sensitive transactions that demand immediate processing. High-volume applications, such as Alibaba's e-commerce platform, achieve peak transaction rates as high as 70 million transactions per second, exceeding the capacity of a single machine. Instead, distributed OLTP database management systems (DBMS) are deployed across multiple powerful machines. Historically, such distributed OLTP DBMSs have been primarily designed to avoid network communication, a paradigm largely unchanged since the 1980s. However, fast networks challenge the conventional belief that network communication is the main bottleneck. In particular, emerging network technologies, like Remote Direct Memory Access (RDMA), radically alter how data can be accessed over a network. RDMA's primitives allow direct access to the memory of a remote machine within an order of magnitude of local memory access. This development invalidates the notion that network communication is the primary bottleneck. Given that traditional distributed database systems have been designed with the premise that the network is slow, they cannot efficiently exploit these fast network primitives, which requires us to reconsider how we design distributed OLTP systems. This thesis focuses on the challenges RDMA presents and its implications on the design of distributed OLTP systems. First, we examine distributed architectures to understand data access patterns and scalability in modern OLTP systems. Drawing on these insights, we advocate a distributed storage engine optimized for high-speed networks. The storage engine serves as the foundation of a database, ensuring efficient data access through three central components: indexes, synchronization primitives, and buffer management (caching). With the introduction of RDMA, the landscape of data access has undergone a significant transformation. This requires a comprehensive redesign of the storage engine components to exploit the potential of RDMA and similar high-speed network technologies. Thus, as the second contribution, we design RDMA-optimized tree-based indexes — especially applicable for disaggregated databases to access remote data efficiently. We then turn our attention to the unique challenges of RDMA. One-sided RDMA, one of the network primitives introduced by RDMA, presents a performance advantage in enabling remote memory access while bypassing the remote CPU and the operating system. This allows the remote CPU to process transactions uninterrupted, with no requirement to be on hand for network communication. However, that way, specialized one-sided RDMA synchronization primitives are required since traditional CPU-driven primitives are bypassed. We found that existing RDMA one-sided synchronization schemes are unscalable or, even worse, fail to synchronize correctly, leading to hard-to-detect data corruption. As our third contribution, we address this issue by offering guidelines to build scalable and correct one-sided RDMA synchronization primitives. Finally, recognizing that maintaining all data in memory becomes economically unattractive, we propose a distributed buffer manager design that efficiently utilizes cost-effective NVMe flash storage. By leveraging low-latency RDMA messages, our buffer manager provides a transparent memory abstraction, accessing the aggregated DRAM and NVMe storage across nodes. Central to our approach is a distributed caching protocol that dynamically caches data. With this approach, our system can outperform RDMA-enabled in-memory distributed databases while managing larger-than-memory datasets efficiently

    Naval Postgraduate School Academic Catalog - February 2023

    Get PDF

    The Blockchain Of Oz : Specifying Blockchain Failures for Scalable Protocols Offering Unprecedented Safety and Decentralization

    Get PDF
    Blockchains have starred an outstanding increase in interest from both business and research since Nakamoto’s 2008 Bitcoin. Unfortunately, many questions in terms of results that establish upper-bounds, and of proposals that approach these bounds. Furthermore, the sudden hype surrounding the blockchain world has led to several proposals that are either only partially public, informal, or not proven correct. The main contribution of this dissertation is to build upon works that steer clear of blockchain puffery, following research methodology. The works of this dissertation converge towards a blockchain that for the first time formally proves and empirically shows deterministic guarantees in the presence of classical Byzantine adversaries, while at the same time pragmatically resolves unlucky cases in which the adversary corrupts an unprecedented percentage of the system. This blockchain is decentralized and scalable, and needs no strong assumptions like synchrony. For this purpose, we build upon previous work and propose a novel attack of synchronous offchain protocols. We then introduce Platypus, an offchain protocol without synchrony. Secondly, we present Trap, a Byzantine fault-tolerant consensus protocol for blockchains that also tolerates up to less than half of the processes deviating. Thirdly, we present Basilic, a class of protocols that solves consensus both against a resilient-optimal Byzantine adversary and against an adversary controlling up to less than 2/3 of combined liveness and safety faults. Then, we use Basilic to present Zero-loss Blockchain (ZLB), a blockchain that tolerates less than 2/3 of safety faults of which less than 1/3 can be Byzantine. Finally, we present two random beacon protocols for committee sortition: Kleroterion and Kleroterion+ , that improve previous works in terms of communication complexity and in the number of faults tolerated, respectively

    Automatic generation of highly concurrent, hierarchical and heterogeneous cache coherence protocols from atomic specifications

    Get PDF
    Cache coherence protocols are often specified using only stable states and atomic transactions for a single cache hierarchy level. Designing highly-concurrent, hierarchical and heterogeneous directory cache coherence protocols from these atomic specifications for modern multicore architectures is a complicated task. To overcome these design challenges we have developed the novel *Gen algorithms (ProtoGen, HieraGen and HeteroGen). Using the *Gen algorithms highly-concurrent, hierarchical and heterogeneous cache coherence protocols can be automatically generated for a wide range of atomic input stable state protocol (SSP) speci fications - including the MOESI variants, as well as for protocols that are targeted towards Total Store Order and Release Consistency. In addition, for each *Gen algorithm we have developed and published an eponymous tool. The ProtoGen tool takes as input a single SSP (i.e., no concurrency) generating the corresponding protocol for a multicore architecture with non-atomic transactions. The ProtoGen algorithm automatically enforces the correct interleaving of conflicting coherence transactions for a given atomic coherence protocol specification. HieraGen is a tool for automatically generating hierarchical cache coherence protocols. Its inputs are SSPs for each level of the hierarchy and its output is a highly concurrent hierarchical protocol. HieraGen thus reduces the complexity that architects face by offloading the challenging task of composing protocols and managing concurrency. HeteroGen is a tool for automatically generating heterogeneous protocols that adhere to precise consistency models. As input, HeteroGen takes SSPs of the per-cluster coherence protocols, each of which satisfies its own per-cluster consistency model. The output is a concurrent (i.e., with transient states) heterogeneous protocol that satisfies a precisely defined consistency model that we refer to as a compound consistency model. To validate the correctness of the *Gen algorithms, the generated output protocols were verified for safety and deadlock freedom using a model checker. To verify the correctness of protocols that need to adhere to a specific compound consistency model generated by HeteroGen, novel litmus tests for multiple compound consistency models were developed. The protocols automatically generated using the *Gen tools have a comparable or better performance than manually generated cache coherence protocols, often discovering opportunities to reduce stalls. Thus, the *Gen tools reduce the complexity that architects face by offloading the challenging tasks of composing protocols and managing concurrency

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF
    • …
    corecore