88 research outputs found

    Fast and Correct Load-Link/Store-Conditional Instruction Handling in DBT Systems

    Get PDF
    Dynamic Binary Translation (DBT) requires the implementation of load-link/store-conditional (LL/SC) primitives for guest systems that rely on this form of synchronization. When targeting e.g. x86 host systems, LL/SC guest instructions are typically emulated using atomic Compare-and-Swap (CAS) instructions on the host. Whilst this direct mapping is efficient, this approach is problematic due to subtle differences between LL/SC and CAS semantics. In this paper, we demonstrate that this is a real problem, and we provide code examples that fail to execute correctly on QEMU and a commercial DBT system, which both use the CAS approach to LL/SC emulation. We then develop two novel and provably correct LL/SC emulation schemes: (1) A purely software based scheme, which uses the DBT system’s page translation cache for correctly selecting between fast, but unsynchronized, and slow, but fully synchronized memory accesses, and (2) a hardware accelerated scheme that leverages hardware transactional memory (HTM) provided by the host. We have implemented these two schemes in the Synopsys DesignWare® ARC® nSIM DBT system, and we evaluate our implementations against full applications, and targeted micro-benchmarks. We demonstrate that our novel schemes are not only correct, but also deliver competitive performance on-par or better than the widely used, but broken CAS scheme.Postprin

    Effective software design and development for the new graph architecture HPC machines.

    Full text link

    Practical Dynamic Transactional Data Structures

    Get PDF
    Multicore programming presents the challenge of synchronizing multiple threads. Traditionally, mutual exclusion locks are used to limit access to a shared resource to a single thread at a time. Whether this lock is applied to an entire data structure, or only a single element, the pitfalls of lock-based programming persist. Deadlock, livelock, starvation, and priority inversion are some of the hazards of lock-based programming that can be avoided by using non-blocking techniques. Non-blocking data structures allow scalable and thread-safe access to shared data by guaranteeing, at least, system-wide progress. In this work, we present the first wait-free hash map which allows a large number of threads to concurrently insert, get, and remove information. Wait-freedom means that all threads make progress in a finite amount of time --- an attribute that can be critical in real-time environments. We only use atomic operations that are provided by the hardware; therefore, our hash map can be utilized by a variety of data-intensive applications including those within the domains of embedded systems and supercomputers. The challenges of providing this guarantee make the design and implementation of wait-free objects difficult. As such, there are few wait-free data structures described in the literature; in particular, there are no wait-free hash maps. It often becomes necessary to sacrifice performance in order to achieve wait-freedom. However, our experimental evaluation shows that our hash map design is, on average, 7 times faster than a traditional blocking design. Our solution outperforms the best available alternative non-blocking designs in a large majority of cases, typically by a factor of 15 or higher. The main drawback of non-blocking data structures is that only one linearizable operation can be executed by each thread, at any one time. To overcome this limitation we present a framework for developing dynamic transactional data containers. Transactional containers are those that execute a sequence of operations atomically and in such a way that concurrent transactions appear to take effect in some sequential order. We take an existing algorithm that transforms non-blocking sets into static transactional versions (LFTT), and we modify it to support maps. We implement a non-blocking transactional hash map using this new approach. We continue to build on LFTT by implementing a lock-free vector using a methodology to allow LFTT to be compatible with non-linked data structures. A static transaction requires all operands and operations to be specified at compile-time, and no code may be executed between transactions. These limitations render static transactions impractical for most use cases. We modify LFTT to support dynamic transactions, and we enhance it with additional features. Dynamic transactions allow operands to be specified at runtime rather than compile-time, and threads can execute code between the data structure operations of a transaction. We build a framework for transforming non-blocking containers into dynamic transactional data structures, called Dynamic Transactional Transformation (DTT), and provide a library of novel transactional containers. Our library provides the wait-free progress guarantee and supports transactions among multiple data structures, whereas previous work on data structure transactions has been limited to operating on a single container. Our approach is 3 times faster than software transactional memory, and its performance matches its lock-free transactional counterpart

    Coarse-Grained, Fine-Grained, and Lock-Free Concurrency Approaches for Self-Balancing B-Tree

    Full text link
    This dissertation examines the concurrency approaches for a standard, unmodified B-Tree which is one of the more complex data structures. This includes the coarse grained, fine-grained locking, and the lock-free approaches. The basic industry standard coarse-grained approach is used as a base-line for comparison to the more advanced fine-grained and lock-free approaches. The fine-grained approach is explored and algorithms are presented for the fine-grained B-Tree insertion and deletion. The lock-free approach is addressed and an algorithm for a lock-free B- Tree insertion is provided. The issues associated with a lock-free deletion are discussed. Comparison trade-offs are presented and discussed. As a final part of this effort, specific testing processes are discussed and presented

    Practical Control-Flow Integrity

    Get PDF
    Control-Flow Integrity (CFI) is effective at defending against prevalent control-flow hijacking attacks. CFI extracts a control-flow graph (CFG) for a given program and instruments the program to respect the CFG. Specifically, checks are inserted before indirect branch instructions. Before these instructions are executed during runtime, the checks consult the CFG to ensure that the indirect branch is allowed to reach the intended target. Hence, any sort of control-flow hijacking would be prevented.However, CFI traditionally suffered from several problems that thwarted its practicality. The first problem is about precise CFG generation. CFI’s security squarely relies on the CFG, therefore the more precise the CFG is, the more security CFI improves, but precise CFG generation was considered hard. The second problem is modularity, or support for dynamic linking. When two CFI modules are linked together dynamically, their CFGs also need to be merged. However, the merge process has to be thread-safe to avoid concurrency issues. The third problem is efficiency. CFI instrumentation adds extra instructions to programs, so it is critical to minimize the performance impact of the CFI checks. Fourth, interoperability is required for CFI solutions to enable gradual adoption in practice, which means that CFI-instrumented modules can be linked with uninstrumented modules without breaking the program.In this dissertation, we propose several practical solutions to the above problems. To generate a precise CFG, we compile the program being protected using a modified compilation toolchain, which can propagate source-level information such as type information to the binary level. At runtime, such information is gathered to generate a relatively precise CFG. On top of this CFG, we further instrument the code so that only if a function’s address is dynamically taken can it be reachable. This approach results in lazily computed per-input CFGs, which provide better precision. To address modularity, we design a lightweight Software Transactional Memory (STM) algorithm to synchronize accesses to the CFG’s data structure at runtime. To minimize the performance overhead, we optimize the CFG representation and access operations so that no heavy buslockinginstructions are needed. For interoperability, we consider addresses in uninstrumented modules as special targets and make the CFI instrumentation aware of them. Finally, we propose a new architecture for Just-In-Time compilers to adopt our proposed CFI schemes

    2016-17 Graduate Bulletin

    Get PDF
    After 2003 the University of Dayton Bulletin went exclusively online. This copy was downloaded from the University of Dayton\u27s website in March 2018.https://ecommons.udayton.edu/bulletin_grad/1047/thumbnail.jp

    2015-16 Graduate Bulletin

    Get PDF
    After 2003 the University of Dayton Bulletin went exclusively online. This copy was downloaded from the University of Dayton\u27s website in March of 2018. Please note: Even though this copy says draft it is the final copy.https://ecommons.udayton.edu/bulletin_grad/1046/thumbnail.jp

    2017-18 Graduate Bulletin

    Get PDF
    After 2003 the University of Dayton Bulletin went exclusively online. This copy was downloaded from the University of Dayton\u27s website in March 2018.https://ecommons.udayton.edu/bulletin_grad/1048/thumbnail.jp

    2014-15 Graduate Bulletin

    Get PDF
    After 2003 the University of Dayton Bulletin went exclusively online. This copy was downloaded from the University of Dayton\u27s website.https://ecommons.udayton.edu/bulletin_grad/1009/thumbnail.jp
    • …
    corecore