15 research outputs found

    Synthesis-Aided Crash Consistency for Storage Systems

    Get PDF
    Reliable storage systems must be crash consistent - guaranteed to recover to a consistent state after a crash. Crash consistency is non-trivial as it requires maintaining complex invariants about persistent data structures in the presence of caching, reordering, and system failures. Current programming models offer little support for implementing crash consistency, forcing storage system developers to roll their own consistency mechanisms. Bugs in these mechanisms can lead to severe data loss for applications that rely on persistent storage. This paper presents a new synthesis-aided programming model for building crash-consistent storage systems. In this approach, storage systems can assume an angelic crash-consistency model, where the underlying storage stack promises to resolve crashes in favor of consistency whenever possible. To realize this model, we introduce a new labeled writes interface for developers to identify their writes to disk, and develop a program synthesis tool, DepSynth, that generates dependency rules to enforce crash consistency over these labeled writes. We evaluate our model in a case study on a production storage system at Amazon Web Services. We find that DepSynth can automate crash consistency for this complex storage system, with similar results to existing expert-written code, and can automatically identify and correct consistency and performance issues

    A DNA-Based Archival Storage System

    Get PDF
    Abstract Demand for data storage is growing exponentially, but the capacity of existing storage media is not keeping up. Using DNA to archive data is an attractive possibility because it is extremely dense, with a raw limit of 1 exabyte/mm 3 (10 9 GB/mm 3 ), and long-lasting, with observed half-life of over 500 years. This paper presents an architecture for a DNA-based archival storage system. It is structured as a key-value store, and leverages common biochemical techniques to provide random access. We also propose a new encoding scheme that offers controllable redundancy, trading off reliability for density. We demonstrate feasibility, random access, and robustness of the proposed encoding with wet lab experiments involving 151 kB of synthesized DNA and a 42 kB randomaccess subset, and simulation experiments of larger sets calibrated to the wet lab experiments. Finally, we highlight trends in biotechnology that indicate the impending practicality of DNA storage for much larger datasets

    Automatically Comparing Memory Consistency Models

    Get PDF
    A memory consistency model (MCM) is the part of a programming language or computer architecture specification that defines which values can legally be read from shared memory locations. Because MCMs take into account various optimisations employed by archi- tectures and compilers, they are often complex and counterintu- itive, which makes them challenging to design and to understand. We identify four tasks involved in designing and understanding MCMs: generating conformance tests, distinguishing two MCMs, checking compiler optimisations, and checking compiler mappings. We show that all four tasks are instances of a general constraint-satisfaction problem to which the solution is either a program or a pair of programs. Although this problem is intractable for automatic solvers when phrased over programs directly, we show how to solve analogous constraints over program executions, and then construct programs that satisfy the original constraints. Our technique, which is implemented in the Alloy modelling framework, is illustrated on several software- and architecture-level MCMs, both axiomatically and operationally defined. We automatically recreate several known results, often in a simpler form, including: distinctions between variants of the C11 MCM; a failure of the ‘SC-DRF guarantee’ in an early C11 draft; that x86 is ‘multi-copy atomic’ and Power is not; bugs in common C11 compiler optimisations; and bugs in a compiler mapping from OpenCL to AMD-style GPUs. We also use our technique to develop and validate a new MCM for NVIDIA GPUs that supports a natural mapping from OpenCL

    Optimizing the Automated Programming Stack

    No full text
    Thesis (Ph.D.)--University of Washington, 2019The scale and pervasiveness of modern software poses a challenge for programmers: software reliability is more important than ever, but the complexity of computer systems continues to grow. Automated programming tools are a powerful way for programmers to tackle this challenge: verifiers that check software correctness, and synthesizers that generate new correct-by-construction programs. These tools are most effective when they apply domain-specific optimizations, but doing so today requires considerable formal methods expertise. This dissertation shows that new abstractions and techniques can empower programmers to build specialized automated programming tools that ensure software reliability. We first demonstrate the importance and effectiveness of automated tools in the context of memory consistency models, which define the behavior of multiprocessor CPUs and whose subtleties often elude even experts. MemSynth is a tool that automatically synthesizes formal descriptions of memory consistency models from examples of CPU behavior, and has found ambiguities and underspecifications in two major computer architectures. We then introduce two new programmer techniques for developing automated programming tools. Metasketches are a new abstraction for building program synthesis tools that integrate search strategy into the problem definition, allowing a metasketch solver to solve synthesis problems that other tools cannot. Symbolic profiling is a technique for systematically identifying and resolving scalability bottlenecks in automated programming tools. Symbolic profiling generalizes across different symbolic evaluation engines and has been used to improve the performance of state-of-the-art automated tools by orders of magnitude. Together, these three contributions demonstrate the value of automated programming tools for building reliable software, and offer guidance on how to build such tools efficiently for new problem domains

    Uncertain < T > : A first-order type for uncertain data

    No full text
    Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces Uncertain〈T〉, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain〈T〉 improves expressiveness and accuracy. Whereas previous probabilistic programming languages focus on experts, Uncertain〈T〉 serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain〈T〉 type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain〈T〉 a compelling programming model for modern applications facing the challenge of uncertainty

    Uncertain&lt;t&gt;: A first-order type for uncertain data. In

    No full text
    Abstract Sampled data from sensors, the web, and people is inherently probabilistic. Because programming languages use discrete types (floats, integers, and booleans), applications, ranging from GPS navigation to web search to polling, express and reason about uncertainty in idiosyncratic ways. This mismatch causes three problems. (1) Using an estimate as a fact introduces errors (walking through walls). (2) Computation on estimates compounds errors (walking at 59 mph). (3) Inference asks questions incorrectly when the data can only answer probabilistic question (e.g., &quot;are you speeding?&quot; versus &quot;are you speeding with high probability&quot;). This paper introduces the uncertain type (Uncertain T ), an abstraction that expresses, propagates, and exposes uncertainty to solve these problems. We present its semantics and a recipe for (a) identifying distributions, (b) computing, (c) inferring, and (d) leveraging domain knowledge in uncertain data. Because Uncertain T computations express an algebra over probabilities, Bayesian statistics ease inference over disparate information (physics, calendars, and maps). Uncertain T leverages statistics, learning algorithms, and domain expertise for experts and abstracts them for nonexpert developers. We demonstrate Uncertain T on two applications. The result is improved correctness, productivity, and expressiveness for probabilistic data

    The model is not enough: Understanding energy consumption in mobile devices

    No full text
    Although battery life has always constrained embedded and mobile hardware developers, the rise of smart phones and tablets has foisted energy as a fundamental constraint onto software developers. Whereas on the desktop, software developers mostl

    A DNA-Based Archival Storage System

    No full text
    corecore