2 research outputs found

    Impact of Cache on Data-Sharing in Multi-Threaded Programmes

    Get PDF
    This thesis answers the question whether a scheduler needs to take into account where communicating threads in multi-threaded applications are executed. The impact of cache on data-sharing in multi-threaded environments is measured. This work investigates a common base-case scenario in the telecommunication industry, where a programme has one thread that writes data and one thread that reads data. A taxonomy of inter-thread communication is defined. Furthermore, a mathematical model that describes inter-thread communication is presented. Two cycle-level experiments were designed to measure latency of CPU registers, cache and main memory. These results were utilised to quantify the model. Three application-level experiments were used to verify the model by comparing predictions of the model and data received in the real-life setting. The model broadens the applicability of experimental results, and it describes three types of communication outlined in the taxonomy. Storing communicating data across all levels of cache does have an impact on the speed of data-intense multithreaded applications. Scheduling threads in a sender-receiver scenario to di↵erent dies in a multi-chip processor decreases speed of execution of such programmes by up to 37%. Pinning such threads to di↵erent cores in the same chip results in up to 5% decrease in speed of execution. The findings of this study show how threads need to be scheduled by a cache-aware scheduler. This project extends the author’s previous work, which investigated cache interference

    A Data Alignment Technique for Improving Cache Performance

    No full text
    We address the problem of improving the data cache performance of numerical applications -- specifically, those with blocked (or tiled) loops. We present DAT, a data alignment technique utilizing array-padding, to improve program performance through minimizing cache conflict misses. We describe algorithms for selecting tile sizes for maximizing data cache utilization, and computing pad sizes for eliminating self-interference conflicts in the chosen tile. We also present a generalization of the technique to handle applications with several tiled arrays. Our experimental results comparing our technique with previous published approaches on machines with different cache configurations show consistently good performance on several benchmark programs, for a variety of problem sizes. 1 Introduction The growing disparity between processor and memory speeds makes the efficient utilization of cache memory a critical factor in determining program performance. Compiler optimizations to exploit t..
    corecore