14 research outputs found

    NearPM: A Near-Data Processing System for Storage-Class Applications

    Full text link
    Persistent Memory (PM) technologies enable program recovery to a consistent state in a case of failure. To ensure this crash-consistent behavior, programs need to enforce persist ordering by employing mechanisms, such as logging and checkpointing, which introduce additional data movement. The emerging near-data processing (NDP) architec-tures can effectively reduce this data movement overhead. In this work we propose NearPM, a near data processor that supports accelerable primitives in crash consistent programs. Using these primitives NearPM accelerate commonly used crash consistency mechanisms logging, checkpointing, and shadow-paging. NearPM further reduces the synchronization overheads between the NDP and the CPU to guarantee persistent ordering by moving ordering handling near memory. We ensures a correct persist ordering between CPU and NDP devices, as well as among multiple NDP devices with Partitioned Persist Ordering (PPO). We prototype NearPM on an FPGA platform.1 NearPM executes data-intensive operations in crash consistency mechanisms with correct ordering guarantees while the rest of the program runs on the CPU. We evaluate nine PM workloads, where each work load supports three crash consistency mechanisms -logging, checkpointing, and shadow paging. Overall, NearPM achieves 4.3-9.8X speedup in the NDP-offloaded operations and 1.22-1.35X speedup in end-to-end execution

    ADDING PERSISTENCE TO MAIN MEMORY PROGRAMMING

    Get PDF
    Unlocking the true potential of the new persistent memories (PMEMs) requires eliminating traditional persistent I/O abstractions altogether, by introducing persistent semantics directly into main memory programming. Such a programming model elevates failure atomicity to a first-class application property in addition to in-memory data layout, concurrency-control, and fault tolerance, and therefore requires redesign of programming abstractions for both program correctness and maximum performance gains. To address these challenges, this thesis proposes a set of system software designs that integrate persistence with main memory programming, and makes the following contributions. First, this thesis proposes a PMEM-aware I/O runtime, NVStream, that supports fast durable streaming I/O. NVStream uses a memory-based I/O interface that integrates with existing I/O data movement operations of an application to accelerate persistent data writes. NVStream carefully designs its persistent data storage layout and crash-consistent semantics to match both application and PMEM characteristics. Specifically, we leverage the streaming nature of I/O in HPC workflows, to benefit from using a log-structured PMEM storage engine design, that uses relaxed write orderings and append-only failure-atomic semantics to form strongly consistent application checkpoints. Furthermore, we identify that optimizing the I/O software stack exposes the PMEM bandwidth limitations as a bottleneck during parallel HPC I/O writes, and propose a novel data movement design – PHX. PHX uses alternative network data movement paths available in datacenters to ease up the bandwidth pressure on the PMEM memory interconnects, all while maintaining the correctness of the persistent data. Next, the thesis explores the challenges and opportunities of using PMEM for true main memory persistent programming – a single data domain for both runtime and persistent applicationstate. Such a programming model includes maintaining ACID properties during each and every update to applications persistent structures. ACID-qualified persistent programming for multi-threaded applications is hard, as the programmer has to reason about both crash-consistency and synchronization – crash-sync – semantics for programming correctness. The thesis contributes new understanding of the correctness requirements for mixing different crash-consistent and synchronization protocols, characterizes the performance of different crash-sync realizations for different applications and hardware architectures, and draws actionable insights for future designs of PMEM systems. Finally, the application state stored on node-local persistent memory is still vulnerable to catastrophic node failures. The thesis proposes a replicated persistent memory runtime, Blizzard, that supports truly fault tolerant, concurrent and persistent data-structure programming. Blizzard carefully integrates userspace networking with byte addressable PMEM for a fast, persistent memory replication runtime. The design also incorporates a replication-aware crash-sync protocol that supports consistent and concurrent updates on persistent data-structures. Blizzard offers applications the flexibility to use the data structures that best match their functional requirements, while offering better performance, and providing crucial reliability guarantees lacking from existing persistent memory runtimes.Ph.D

    On the Secure and Resilient Design of Connected Vehicles: Methods and Guidelines

    Get PDF
    Vehicles have come a long way from being purely mechanical systems to systems that consist of an internal network of more than 100 microcontrollers and systems that communicate with external entities, such as other vehicles, road infrastructure, the manufacturer’s cloud and external applications. This combination of resource constraints, safety-criticality, large attack surface and the fact that millions of people own and use them each day, makes securing vehicles particularly challenging as security practices and methods need to be tailored to meet these requirements.This thesis investigates how security demands should be structured to ease discussions and collaboration between the involved parties and how requirements engineering can be accelerated by introducing generic security requirements. Practitioners are also assisted in choosing appropriate techniques for securing vehicles by identifying and categorising security and resilience techniques suitable for automotive systems. Furthermore, three specific mechanisms for securing automotive systems and providing resilience are designed and evaluated. The first part focuses on cyber security requirements and the identification of suitable techniques based on three different approaches, namely (i) providing a mapping to security levels based on a review of existing security standards and recommendations; (ii) proposing a taxonomy for resilience techniques based on a literature review; and (iii) combining security and resilience techniques to protect automotive assets that have been subject to attacks. The second part presents the design and evaluation of three techniques. First, an extension for an existing freshness mechanism to protect the in-vehicle communication against replay attacks is presented and evaluated. Second, a trust model for Vehicle-to-Vehicle communication is developed with respect to cyber resilience to allow a vehicle to include trust in neighbouring vehicles in its decision-making processes. Third, a framework is presented that enables vehicle manufacturers to protect their fleet by detecting anomalies and security attacks using vehicle trust and the available data in the cloud

    Infrastructure for Performance Monitoring and Analysis of Systems and Applications

    Get PDF
    The growth of High Performance Computer (HPC) systems increases the complexity with respect to understanding resource utilization, system management, and performance issues. HPC performance monitoring tools need to collect information at both the application and system levels to yield a complete performance picture. Existing approaches limit the abilities of the users to do meaningful analysis on actionable timescale. Efficient infrastructures are required to support largescale systems performance data analysis for both run-time troubleshooting and post-run processing modes. In this dissertation, we present methods to fill these gaps in the infrastructure for HPC performance monitoring and analysis. First, we enhance the architecture of a monitoring system to integrate streaming analysis capabilities at arbitrary locations within its data collection, transport, and aggregation facilities. Next, we present an approach to streaming collection of application performance data. We integrate these methods with a monitoring system used on large-scale computational platforms. Finally, we present a new approach for constructing durable transactional linked data structures that takes advantage of byte-addressable non-volatile memory technologies. Transactional data structures are building blocks of in-memory databases that are used by HPC monitoring systems to store and retrieve data efficiently. We evaluate the presented approaches on a series of case studies. The experiment results demonstrate the impact of our tools, while keeping the overhead in an acceptable margin

    Runtime Systems for Persistent Memories

    Full text link
    Emerging persistent memory (PM) technologies promise the performance of DRAM with the durability of disk. However, several challenges remain in existing hardware, programming, and software systems that inhibit wide-scale PM adoption. This thesis focuses on building efficient mechanisms that span hardware and operating systems, and programming languages for integrating PMs in future systems. First, this thesis proposes a mechanism to solve low-endurance problem in PMs. PMs suffer from limited write endurance---PM cells can be written only 10^7-10^9 times before they wear out. Without any wear management, PM lifetime might be as low as 1.1 months. This thesis presents Kevlar, an OS-based wear-management technique for PM, that requires no new hardware. Kevlar uses existing virtual memory mechanisms to remap pages, enabling it to perform both wear leveling---shuffling pages in PM to even wear; and wear reduction---transparently migrating heavily written pages to DRAM. Crucially, Kevlar avoids the need for hardware support to track wear at fine grain. It relies on a novel wear-estimation technique that builds upon Intel's Precise Event Based Sampling to approximately track processor cache contents via a software-maintained Bloom filter and estimate write-back rates at fine grain. Second, this thesis proposes a persistency model for high-level languages to enable integration of PMs in to future programming systems. Prior works extend language memory models with a persistency model prescribing semantics for updates to PM. These approaches require high-overhead mechanisms, are restricted to certain synchronization constructs, provide incomplete semantics, and/or may recover to state that cannot arise in fault-free program execution. This thesis argues for persistency semantics that guarantee failure atomicity of synchronization-free regions (SFRs) --- program regions delimited by synchronization operations. The proposed approach provides clear semantics for the PM state that recovery code may observe and extends C++11's "sequential consistency for data-race-free" guarantee to post-failure recovery code. To this end, this thesis investigates two designs for failure-atomic SFRs that vary in performance and the degree to which commit of persistent state may lag execution. Finally, this thesis proposes StrandWeaver, a hardware persistency model that minimally constrains ordering on PM operations. Several language-level persistency models have emerged recently to aid programming recoverable data structures in PM. The language-level persistency models are built upon hardware primitives that impose stricter ordering constraints on PM operations than the persistency models require. StrandWeaver manages PM order within a strand, a logically independent sequence of PM operations within a thread. PM operations that lie on separate strands are unordered and may drain concurrently to PM. StrandWeaver implements primitives under strand persistency to allow programmers to improve concurrency and relax ordering constraints on updates as they drain to PM. Furthermore, StrandWeaver proposes mechanisms that map persistency semantics in high-level language persistency models to the primitives implemented by StrandWeaver.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155100/1/vgogte_1.pd

    Architectural support for persistent memory systems

    Get PDF
    The long stated vision of persistent memory is set to be realized with the release of 3D XPoint memory by Intel and Micron. Persistent memory, as the name suggests, amalgamates the persistence (non-volatility) property of storage devices (like disks) with byte-addressability and low latency of memory. These properties of persistent memory coupled with its accessibility through the processor load/store interface enable programmers to design in-memory persistent data structures. An important challenge in designing persistent memory systems is to provide support for maintaining crash consistency of these in-memory data structures. Crash consistency is necessary to ensure the correct recovery of program state after a crash. Ordering is a primitive that can be used to design crash consistent programs. It provides guarantees on the order of updates to persistent memory. Atomicity can also be used to design crash consistent programs via two primitives. First, as an atomic durability primitive which guarantees that in the presence of system crashes updates are made durable atomically, which means either all or none of the updates are made durable. Second, in the form of ACID transactions that guarantee atomic visibility and atomic durability. Existing systems do not support ordering, let alone atomic durability or ACID. In fact, these systems implement various performance enhancing optimizations that deliberately reorder updates to memory. Moreover, software in these systems cannot explicitly control the movement of data from volatile cache to persistent memory. Therefore, any ordering requirement has to be enforced synchronously which degrades performance because program execution is stalled waiting for updates to reach persistent memory. This thesis aims to provide the design principles and efficient implementations for three crash consistency primitives: ordering, atomic durability and ACID transactions. A set of persistency models have been proposed recently which provide support for the ordering primitive. This thesis extends the taxonomy of these models by adding buffering, which allows the hardware to enforce ordering in the background, as a new layer of classification. It then goes on show how the existing implementation of a buffered model degenerates to a performance inefficient non-buffered model because of the presence of conflicts and proposes efficient solutions to eliminate or limit the impact of these conflicts with minimal hardware modifications. This thesis also proposes the first implementation of a buffered model for a server class processor with multi-banked caches and multiple memory controllers. Write ahead logging (WAL) is a commonly used approach to provide atomic durability. This thesis argues that existing implementations ofWAL in software are not only inefficient, because of the fine grained ordering dependencies, but also waste precious execution cycles to implement a fundamentally data movement task. It then proposes ATOM, a hardware log manager based on undo logging that performs the logging operation out of the critical path. This thesis presents the design principles behind ATOM and two techniques that optimize its performance. These techniques enable the memory controller to enforce fine grained ordering required for logging and to even perform logging in some cases. In doing so, ATOM significantly reduces processor stall cycles and improves performance. The most commonly used abstraction employed to atomically update persistent data is that of durable transactions with ACID (Atomicity, Consistency, Isolation and Durability) semantics that make updates within a transaction both visible and durable atomically. As a final contribution, this thesis tackles the problem of providing efficient support for durable transactions in hardware by integrating hardware support for atomic durability with hardware transactional memory (HTM). It proposes DHTM (durable hardware transactional memory) in which durability is considered as a first class design constraint. DHTM guarantees atomic durability via hardware redo-logging, and integrates this logging support with a commercial HTM to provide atomic visibility. Furthermore, DHTM leverages the same logging infrastructure to extend the supported transaction size, from being L1-limited to the LLC, with minor changes to the coherence protocol
    corecore