13 research outputs found

    Runtime Systems for Persistent Memories

    Full text link
    Emerging persistent memory (PM) technologies promise the performance of DRAM with the durability of disk. However, several challenges remain in existing hardware, programming, and software systems that inhibit wide-scale PM adoption. This thesis focuses on building efficient mechanisms that span hardware and operating systems, and programming languages for integrating PMs in future systems. First, this thesis proposes a mechanism to solve low-endurance problem in PMs. PMs suffer from limited write endurance---PM cells can be written only 10^7-10^9 times before they wear out. Without any wear management, PM lifetime might be as low as 1.1 months. This thesis presents Kevlar, an OS-based wear-management technique for PM, that requires no new hardware. Kevlar uses existing virtual memory mechanisms to remap pages, enabling it to perform both wear leveling---shuffling pages in PM to even wear; and wear reduction---transparently migrating heavily written pages to DRAM. Crucially, Kevlar avoids the need for hardware support to track wear at fine grain. It relies on a novel wear-estimation technique that builds upon Intel's Precise Event Based Sampling to approximately track processor cache contents via a software-maintained Bloom filter and estimate write-back rates at fine grain. Second, this thesis proposes a persistency model for high-level languages to enable integration of PMs in to future programming systems. Prior works extend language memory models with a persistency model prescribing semantics for updates to PM. These approaches require high-overhead mechanisms, are restricted to certain synchronization constructs, provide incomplete semantics, and/or may recover to state that cannot arise in fault-free program execution. This thesis argues for persistency semantics that guarantee failure atomicity of synchronization-free regions (SFRs) --- program regions delimited by synchronization operations. The proposed approach provides clear semantics for the PM state that recovery code may observe and extends C++11's "sequential consistency for data-race-free" guarantee to post-failure recovery code. To this end, this thesis investigates two designs for failure-atomic SFRs that vary in performance and the degree to which commit of persistent state may lag execution. Finally, this thesis proposes StrandWeaver, a hardware persistency model that minimally constrains ordering on PM operations. Several language-level persistency models have emerged recently to aid programming recoverable data structures in PM. The language-level persistency models are built upon hardware primitives that impose stricter ordering constraints on PM operations than the persistency models require. StrandWeaver manages PM order within a strand, a logically independent sequence of PM operations within a thread. PM operations that lie on separate strands are unordered and may drain concurrently to PM. StrandWeaver implements primitives under strand persistency to allow programmers to improve concurrency and relax ordering constraints on updates as they drain to PM. Furthermore, StrandWeaver proposes mechanisms that map persistency semantics in high-level language persistency models to the primitives implemented by StrandWeaver.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155100/1/vgogte_1.pd

    A Practical Oblivious Map Data Structure with Secure Deletion and History Independence

    Get PDF
    We present a new oblivious RAM that supports variable-sized storage blocks (vORAM), which is the first ORAM to allow varying block sizes without trivial padding. We also present a new history-independent data structure (a HIRB tree) that can be stored within a vORAM. Together, this construction provides an efficient and practical oblivious data structure (ODS) for a key/value map, and goes further to provide an additional privacy guarantee as compared to prior ODS maps: even upon client compromise, deleted data and the history of old operations remain hidden to the attacker. We implement and measure the performance of our system using Amazon Web Services, and the single-operation time for a realistic database (up to 2182^{18} entries) is less than 1 second. This represents a 100x speed-up compared to the current best oblivious map data structure (which provides neither secure deletion nor history independence) by Wang et al. (CCS 14)

    Towards Automatic Load Balancing of a Parallel File System with Subfile Based Migration

    Get PDF
    Nowadays scientific applications address complex problems in nature. As a consequence they have a high demand for computation power and for an I/O infrastructure providing performant access to data. These demands are satisfied by various supercomputers in order to engage grand challenges. From the operator's point of view it is important to keep the available resources of such a multi dollar machine busy, while the end-user is concerned about the runtime of the application. Evidently it is vital to avoid idle times in an application and congestion of network as well as of the I/O subsystem to ensure maximum concurrency and thus efficiency. Load balancing is a technique which tackles this issues. While load balancing has been evaluated in detail for computational parts of a program, analysis of load imbalance for complex storage environments in High Performance Computing has to be addressed as well. Often parallel file systems like Lustre, GPFS, or PVFS2 are deployed to meet the needs of a fast I/O infrastructure. This thesis evaluates the impact of unbalanced workloads in such parallel file systems exemplarily on PVFS2 and extends the environment to allow dynamic (and adaptive) load balancing. Some cases leading to unbalanced workloads are discussed, namely unbalanced access patterns, inhomogeneous hardware, and rebuilds after crashes in an environment promising high availability. Important factors related to the performance are described, this allows to build simple performance models on which the impact of such load imbalances can be assessed. Some potential countermeasures to fix these unbalanced workloads are discussed in the thesis. While most cases could be alleviated by static load balancing mechanisms a dynamic load balancing seems important to make up for environments with fluctuating performance characteristics. In the thesis extensions to the software environment are designed and realized that provide capabilities to detect bottlenecks and to fix them by moving data from higher loaded servers to lower loaded servers. Therefore, further mechanisms are integrated into PVFS2, which allow and support dynamic decisions to move data by a load-balancer. A first heuristics is implemented using the extensions to demonstrate how they can be used to build a dynamic load-balancer. Experiments are run with balanced as well as unbalanced workloads to show the server behavior.Also a few experiments with the developed load-balancer in a real environment are made. These results demonstrate problematic issues and demonstrate that load balancing techniques could be successfully applied to increase productivity

    Elastic phone : towards detecting and mitigating computation and energy inefficiencies in mobile apps

    Get PDF
    Mobile devices have become ubiquitous and their ever evolving capabilities are bringing them closer to personal computers. Nonetheless, due to their mobility and small size factor constraints, they still present many hardware and software challenges. Their limited battery life time has led to the design of mobile networks that are inherently different from previous networks (e.g., wifi) and more restrictive task scheduling. Additionally, mobile device ecosystems are more susceptible to the heterogeneity of hardware and from conflicting interests of distributors, internet service providers, manufacturers, developers, etc. The high number of stakeholders ultimately responsible for the performance of a device, results in an inconsistent behavior and makes it very challenging to build a solution that improves resource usage in most cases. The focus of this thesis is on the study and development of techniques to detect and mitigate computation and energy inefficiencies in mobile apps. It follows a bottom-up approach, starting from the challenges behind detecting inefficient execution scheduling by looking only at apps’ implementations. It shows that scheduling APIs are largely misused and have a great impact on devices wake up frequency and on the efficiency of existing energy saving techniques (e.g., batching scheduled executions). Then it addresses many challenges of app testing in the dynamic analysis field. More specifically, how to scale mobile app testing with realistic user input and how to analyze closed source apps’ code at runtime, showing that introducing humans in the app testing loop improves the coverage of app’s code and generated network volume. Finally, using the combined knowledge of static and dynamic analysis, it focuses on the challenges of identifying the resource hungry sections of apps and how to improve their execution via offloading. There is a special focus on performing non-intrusive offloading transparent to existing apps and on in-network computation offloading and distribution. It shows that, even without a custom OS or app modifications, in-network offloading is still possible, greatly improving execution times, energy consumption and reducing both end-user experienced latency and request drop rates. It concludes with a real app measurement study, showing that a good portion of the most popular apps’ code can indeed be offloaded and proposes future directions for the app testing and computation offloading fields.Los dispositivos móviles se han tornado omnipresentes y sus capacidades están en constante evolución acercándolos a los computadoras personales. Sin embargo, debido a su movilidad y tamaño reducido, todavía presentan muchos desafíos de hardware y software. Su duración limitada de batería ha llevado al diseño de redes móviles que son inherentemente diferentes de las redes anteriores y una programación de tareas más restrictiva. Además, los ecosistemas de dispositivos móviles son más susceptibles a la heterogeneidad de hardware y los intereses conflictivos de las entidades responsables por el rendimiento final de un dispositivo. El objetivo de esta tesis es el estudio y desarrollo de técnicas para detectar y mitigar las ineficiencias de computación y energéticas en las aplicaciones móviles. Empieza con los desafíos detrás de la detección de planificación de ejecución ineficientes, mirando sólo la implementación de las aplicaciones. Se muestra que las API de planificación son en gran medida mal utilizadas y tienen un gran impacto en la frecuencia con que los dispositivos despiertan y en la eficiencia de las técnicas de ahorro de energía existentes. A continuación, aborda muchos desafíos de las pruebas de aplicaciones en el campo de análisis dinámica. Más específicamente, cómo escalar las pruebas de aplicaciones móviles con una interacción realista y cómo analizar código de aplicaciones de código cerrado durante la ejecución, mostrando que la introducción de humanos en el bucle de prueba de aplicaciones mejora la cobertura del código y el volumen de comunicación de red generado. Por último, combinando la análisis estática y dinámica, se centra en los desafíos de identificar las secciones de aplicaciones con uso intensivo de recursos y cómo mejorar su ejecución a través de la ejecución remota (i.e.,"offload"). Hay un enfoque especial en el "offload" no intrusivo y transparente a las aplicaciones existentes y en el "offload"y distribución de computación dentro de la red. Demuestra que, incluso sin un sistema operativo personalizado o modificaciones en la aplicación, el "offload" en red sigue siendo posible, mejorando los tiempos de ejecución, el consumo de energía y reduciendo la latencia del usuario final y las tasas de caída de solicitudes de "offload". Concluye con un estudio real de las aplicaciones más populares, mostrando que una buena parte de su código puede de hecho ser ejecutado remotamente y propone direcciones futuras para los campos de "offload" de aplicaciones

    Elastic phone : towards detecting and mitigating computation and energy inefficiencies in mobile apps

    Get PDF
    Mobile devices have become ubiquitous and their ever evolving capabilities are bringing them closer to personal computers. Nonetheless, due to their mobility and small size factor constraints, they still present many hardware and software challenges. Their limited battery life time has led to the design of mobile networks that are inherently different from previous networks (e.g., wifi) and more restrictive task scheduling. Additionally, mobile device ecosystems are more susceptible to the heterogeneity of hardware and from conflicting interests of distributors, internet service providers, manufacturers, developers, etc. The high number of stakeholders ultimately responsible for the performance of a device, results in an inconsistent behavior and makes it very challenging to build a solution that improves resource usage in most cases. The focus of this thesis is on the study and development of techniques to detect and mitigate computation and energy inefficiencies in mobile apps. It follows a bottom-up approach, starting from the challenges behind detecting inefficient execution scheduling by looking only at apps’ implementations. It shows that scheduling APIs are largely misused and have a great impact on devices wake up frequency and on the efficiency of existing energy saving techniques (e.g., batching scheduled executions). Then it addresses many challenges of app testing in the dynamic analysis field. More specifically, how to scale mobile app testing with realistic user input and how to analyze closed source apps’ code at runtime, showing that introducing humans in the app testing loop improves the coverage of app’s code and generated network volume. Finally, using the combined knowledge of static and dynamic analysis, it focuses on the challenges of identifying the resource hungry sections of apps and how to improve their execution via offloading. There is a special focus on performing non-intrusive offloading transparent to existing apps and on in-network computation offloading and distribution. It shows that, even without a custom OS or app modifications, in-network offloading is still possible, greatly improving execution times, energy consumption and reducing both end-user experienced latency and request drop rates. It concludes with a real app measurement study, showing that a good portion of the most popular apps’ code can indeed be offloaded and proposes future directions for the app testing and computation offloading fields.Los dispositivos móviles se han tornado omnipresentes y sus capacidades están en constante evolución acercándolos a los computadoras personales. Sin embargo, debido a su movilidad y tamaño reducido, todavía presentan muchos desafíos de hardware y software. Su duración limitada de batería ha llevado al diseño de redes móviles que son inherentemente diferentes de las redes anteriores y una programación de tareas más restrictiva. Además, los ecosistemas de dispositivos móviles son más susceptibles a la heterogeneidad de hardware y los intereses conflictivos de las entidades responsables por el rendimiento final de un dispositivo. El objetivo de esta tesis es el estudio y desarrollo de técnicas para detectar y mitigar las ineficiencias de computación y energéticas en las aplicaciones móviles. Empieza con los desafíos detrás de la detección de planificación de ejecución ineficientes, mirando sólo la implementación de las aplicaciones. Se muestra que las API de planificación son en gran medida mal utilizadas y tienen un gran impacto en la frecuencia con que los dispositivos despiertan y en la eficiencia de las técnicas de ahorro de energía existentes. A continuación, aborda muchos desafíos de las pruebas de aplicaciones en el campo de análisis dinámica. Más específicamente, cómo escalar las pruebas de aplicaciones móviles con una interacción realista y cómo analizar código de aplicaciones de código cerrado durante la ejecución, mostrando que la introducción de humanos en el bucle de prueba de aplicaciones mejora la cobertura del código y el volumen de comunicación de red generado. Por último, combinando la análisis estática y dinámica, se centra en los desafíos de identificar las secciones de aplicaciones con uso intensivo de recursos y cómo mejorar su ejecución a través de la ejecución remota (i.e.,"offload"). Hay un enfoque especial en el "offload" no intrusivo y transparente a las aplicaciones existentes y en el "offload"y distribución de computación dentro de la red. Demuestra que, incluso sin un sistema operativo personalizado o modificaciones en la aplicación, el "offload" en red sigue siendo posible, mejorando los tiempos de ejecución, el consumo de energía y reduciendo la latencia del usuario final y las tasas de caída de solicitudes de "offload". Concluye con un estudio real de las aplicaciones más populares, mostrando que una buena parte de su código puede de hecho ser ejecutado remotamente y propone direcciones futuras para los campos de "offload" de aplicaciones.Postprint (published version

    Goddard Conference on Mass Storage Systems and Technologies, Volume 1

    Get PDF
    Copies of nearly all of the technical papers and viewgraphs presented at the Goddard Conference on Mass Storage Systems and Technologies held in Sep. 1992 are included. The conference served as an informational exchange forum for topics primarily relating to the ingestion and management of massive amounts of data and the attendant problems (data ingestion rates now approach the order of terabytes per day). Discussion topics include the IEEE Mass Storage System Reference Model, data archiving standards, high-performance storage devices, magnetic and magneto-optic storage systems, magnetic and optical recording technologies, high-performance helical scan recording systems, and low end helical scan tape drives. Additional topics addressed the evolution of the identifiable unit for processing purposes as data ingestion rates increase dramatically, and the present state of the art in mass storage technology

    ADDING PERSISTENCE TO MAIN MEMORY PROGRAMMING

    Get PDF
    Unlocking the true potential of the new persistent memories (PMEMs) requires eliminating traditional persistent I/O abstractions altogether, by introducing persistent semantics directly into main memory programming. Such a programming model elevates failure atomicity to a first-class application property in addition to in-memory data layout, concurrency-control, and fault tolerance, and therefore requires redesign of programming abstractions for both program correctness and maximum performance gains. To address these challenges, this thesis proposes a set of system software designs that integrate persistence with main memory programming, and makes the following contributions. First, this thesis proposes a PMEM-aware I/O runtime, NVStream, that supports fast durable streaming I/O. NVStream uses a memory-based I/O interface that integrates with existing I/O data movement operations of an application to accelerate persistent data writes. NVStream carefully designs its persistent data storage layout and crash-consistent semantics to match both application and PMEM characteristics. Specifically, we leverage the streaming nature of I/O in HPC workflows, to benefit from using a log-structured PMEM storage engine design, that uses relaxed write orderings and append-only failure-atomic semantics to form strongly consistent application checkpoints. Furthermore, we identify that optimizing the I/O software stack exposes the PMEM bandwidth limitations as a bottleneck during parallel HPC I/O writes, and propose a novel data movement design – PHX. PHX uses alternative network data movement paths available in datacenters to ease up the bandwidth pressure on the PMEM memory interconnects, all while maintaining the correctness of the persistent data. Next, the thesis explores the challenges and opportunities of using PMEM for true main memory persistent programming – a single data domain for both runtime and persistent applicationstate. Such a programming model includes maintaining ACID properties during each and every update to applications persistent structures. ACID-qualified persistent programming for multi-threaded applications is hard, as the programmer has to reason about both crash-consistency and synchronization – crash-sync – semantics for programming correctness. The thesis contributes new understanding of the correctness requirements for mixing different crash-consistent and synchronization protocols, characterizes the performance of different crash-sync realizations for different applications and hardware architectures, and draws actionable insights for future designs of PMEM systems. Finally, the application state stored on node-local persistent memory is still vulnerable to catastrophic node failures. The thesis proposes a replicated persistent memory runtime, Blizzard, that supports truly fault tolerant, concurrent and persistent data-structure programming. Blizzard carefully integrates userspace networking with byte addressable PMEM for a fast, persistent memory replication runtime. The design also incorporates a replication-aware crash-sync protocol that supports consistent and concurrent updates on persistent data-structures. Blizzard offers applications the flexibility to use the data structures that best match their functional requirements, while offering better performance, and providing crucial reliability guarantees lacking from existing persistent memory runtimes.Ph.D

    Fifth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Fifth Goddard Conference on Mass Storage Systems and Technologies held September 17 - 19, 1996, at the University of Maryland, University Conference Center in College Park, Maryland. As one of an ongoing series, this conference continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include storage architecture, database management, data distribution, file system performance and modeling, and optical recording technology. There will also be a paper on Application Programming Interfaces (API) for a Physical Volume Repository (PVR) defined in Version 5 of the Institute of Electrical and Electronics Engineers (IEEE) Reference Model (RM). In addition, there are papers on specific archives and storage products

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

    High Level Concurrency in C∀

    Get PDF
    Concurrent programs are notoriously hard to write and even harder to debug. Furthermore concurrent programs must be performant, as the introduction of concurrency into a program is often done to achieve some form of speedup. This thesis presents a suite of high-level concurrent-language features in the new programming language C∀, all of which are implemented with the aim of improving the performance, productivity, and safety of concurrent programs. C∀ is a non-object-oriented programming language that extends C. The foundation for concurrency in C∀ was laid by Thierry Delisle [15], who implemented coroutines, user-level threads, and monitors. This thesis builds upon that work and introduces a suite of new concurrent features as its main contribution. The features include a mutex statement (similar to a C++ scoped lock or Java synchronized statement), Go-like channels and select statement, and an actor system. The root ideas behind these features are not new, but the C∀ implementations extends the original ideas in performance, productivity, and safety
    corecore