14 research outputs found

    Compiler Aided Manual Speculation for High Performance Concurrent Data Structures ∗

    No full text
    Speculation is a well-known means of increasing parallelism among concurrent methods that are usually but not always independent. Traditional nonblocking data structures employ a particularly restrictive form of speculation. Software transactional memory (STM) systems employ a much more general—though typically blocking—form, and there is a wealth of options in between. Using several different concurrent data structures as examples, we show that manual addition of speculation to traditional lockbased code can lead to significant performance improvements. Successful speculation requires careful consideration of profitability, and of how and when to validate consistency. Unfortunately, it also requires substantial modifications to code structure and a deep understanding of the memory model. These latter requirements make it difficult to use in its purely manual form, even for expert programmers. To simplify the process, we present a compiler tool, CSpec, that automatically generates speculative code from baseline lock-based code with user annotations. Compiler-aided manual speculation keeps the original code structure for better readability and maintenance, while providing the flexibility to chose speculation and validation strategies. Experiments on UltraSPARC and x86 platforms demonstrate that with a small number annotations added to lock-based code, CSpec can generate speculative code that matches the performance of best-effort hand-written versions

    Software Partitioning of Hardware Transactions

    No full text
    Best-effort hardware transactional memory (HTM) allows complex operations to execute atomically and in parallel, so long as hardware buffers do not overflow, and conflicts are not encountered with concurrent operations. We describe a programming technique and compiler support to reduce both overflow and conflict rates by partitioning common operations into read-mostly (planning) and write-mostly (completion) operations, which then execute separately. The completion operation remains transactional; planning can often occur in ordinary code. High-level (semantic) atomicity for the overall operation is ensured by passing an application-specific validator object between planning and completion. Transparent composition of partitioned operations is made possible through fully-automated compiler support, which migrates all planning operations out of the parent transaction while respecting all program data flow and dependences. For both micro- and macro-benchmarks, experiments on IBM z-Series and Intel Haswell machines demonstrate that partitioning can lead to dramatically lower abort rates and higher scalability

    Compiler assisted speculation for multithreaded systems

    No full text
    Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2015.Multithreaded programming requires synchronization, to coordinate access to shared resources. As more and more processor cores become available on a single machine, synchronization tends to be the performance bottleneck of many multithreaded programs. This thesis focuses on two important synchronization scenarios: lock-based critical sections and transactional atomic blocks. Since speculation is a well-known means of increasing parallelism among concurrent executions that are usually but not always independent, this thesis first explores the manual addition of speculation to lock-based critical sections (in concurrent data structures). With simple language extensions accompanied by compiler assistance, a technique which we refer to as CSpec, the programmer can exploit high-level program knowledge to move speculative work out of lock-based critical sections, thereby improving scalability while still maintaining correctness. Speculation is also a major approach to improved parallelism in transactional memory systems. Frequent failures of speculative execution, however, may render the technique unprofitable. In recently emerged best-effort hardware transactional memory, speculation fails mainly due to two reasons: hardware overflow and data conflicts. In this thesis we develop a programming technique and compiler support, ParT, to reduce the duration and memory footprint of hardware transactions, leading to lower abort rates while preserving deadlock-free composability. To reduce the incidence of conflict, we propose an automatic, high-level mechanism, Staggered Transactions, that uses advisory locks to serialize (just) the portions of the transactions in which conflicting accesses occur. In all proposed techniques, compiler assistance is essential to maintaining ease of programming. We thus conclude that compiler assistance can significantly improve speculation in multithreaded systems

    The Design and Implementation of the DVS Based Dynamic Compiler for Power Reduction *

    No full text
    Abstract. Recent years, as the wide deployment of embedded and mobile devices, reducing the power consumption in order to extend the battery life become a major factor that a designer must consider when designing a new architecture. DVS is regarded as one of the most effective power reduction techniques. This paper focus on run-time compiler driven DVS for power reduction, especially two key design issues include DVS analysis model and DVS decision algorithm. Based on the design framework presented in this work, we also implement a run-time DVS compiler which is fine-grained, adaptive to the program’s running environment without changing its behavior. The obtained system is deployed in a real hardware platform. Experimental results, based on some benchmarks, shows that with average 5 % performance loss, the benchmarks benefit with 26 % energy savings and the energy delay product (EDP) improvement is 22%.

    Implementation of LDPC encoding to DTMB standard based on FPGA

    No full text
    Conference Name:2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011. Conference Address: Sanya, Hainan Island, China. Time:May 16, 2011 - May 18, 2011.IEEE Computer Society; Int. Assoc. Comput. Inf. Sci. (ACIS); Institute of Electrical and Electronics Engineers (IEEE); Hainan UniversityIn this paper, an implementation of Low-Density Parity-Check (LDPC) encoder is introduced, which meets the demand of Chinese Digital Terrestrial Multimedia Broadcasting (DTMB) standard. A design of the LDPC encoder which uses a partially-parallel encoding structure based on the Shift Register Adder Accumulator (SRAA) circuit is studied according to the irregular quasi-cyclic characteristic of LDPC encoding specified by the standard. Then we use the FPGA to implement the design. The simulation and implementation results show that the design meets the requirement of DTMB standard and reduces the resource usage. ? 2011 IEEE
    corecore