5 research outputs found

    DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

    Full text link
    Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.Comment: Our open source software is available at https://github.com/CMU-SAFARI/DAMO

    Storageless Value Prediction Using Prior Register Values

    No full text
    This paper presents a technique called register value prediction (RVP) which uses a type of locality called register-value reuse. By predicting that an instruction will produce the value that is already stored in the destination register, we eliminate the need for large value buffers to enable value prediction. Even without the large buffers, register-value prediction can be made as or more effective than last-value prediction, particularly with the aid of compiler management of values in the register file. Both static and dynamic register value prediction techniques are demonstrated to exploit register-value reuse, the former requiring minimal instruction set architecture changes and the latter requiring a set of small confidence counters. We show an average gain of 12 % with dynamic RVP and moderate compiler assistance on a next generation processor, and 15 % on a 16-wide processor

    Storageless value prediction using prior register values

    No full text
    corecore