45 research outputs found

    Silent stores and store value locality

    No full text
    This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder

    Redeeming ipc as a performance metric for multithreaded programs

    No full text
    Recent work has shown that multithreaded workloads running in execution-driven, full-system simulation environments cannot use instructions per cycle (IPC) as a valid performance metric due to non-deterministic program behavior. Unfortunately, invalidating IPC as a performance metric introduces its own host of difficulties: special workload setup, consideration of cold-start and end-effects, statistical methodologies leading to increased simulation bandwidth, and workload-specific, higher-level metrics to measure performance. This paper explores the non-determinism problem in multithreaded programs, describes a method to eliminate non-determinism across simulations of different experimental machine models, and demonstrates the suitability of this methodology for performing architectural performance analysis, thus redeeming IPC as a performance metric for multithreaded programs.

    A dynamic binary translation approach to architectural simulation

    No full text
    This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder

    Compiler-based prefetching for recursive data structures

    No full text

    Compiler-based prefetching for recursive data structures

    No full text

    Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing

    Get PDF
    This paper explores the interaction of value prediction with thread-level parallelism techniques, including multithreading and multiprocessing, where correctness is defined by a memory consistency model. Value prediction subtly interacts with the memory consistency model by allowing data dependent instructions to be reordered. We find that predicting a value and later verifying that the value eventually calculated is the same as the value predicted is not always sufficient. We present an example of a multithreaded pointer manipulation that can generate a surprising and erroneous result when value prediction is implemented without considering memory consistency correctness. We show that this problem can occur with real software, and we discuss how to apply existing techniques to eliminate the problem in both sequentially consistent systems and systems that obey relaxed memory consistency models
    corecore