Cache Memory Access Patterns in the GPU Architecture

Abstract

Data exchange between a Central Processing Unit (CPU) and a Graphic Processing Unit (GPU) can be very expensive in terms of performance. The characterization of data and cache memory access patterns differ between a CPU and a GPU. The motivation of this research is to analyze the cache memory access patterns of GPU architectures and to potentially improve data exchange between a CPU and GPU. The methodology of this work uses Multi2Sim GPU simulator for AMD Radeon and NVIDIA Kepler GPU architectures. This simulator, used to emulate the GPU architecture in software, enables certain code modifications for the L1 and L2 cache memory blocks. Multi2Sim was configured to run multiple benchmarks to analyze and record how the benchmarks access GPU cache memory. The recorded results were used to study three main metrics: (1) Most Recently Used (MRU) and Least Recently Used (LRU) accesses for L1 and L2 caches, (2) Inter-warp and Intra-warp cache memory accesses in the GPU architecture for different sets of workloads, and (3) To record and compare the GPU cache access patterns for certain machine learning benchmarks with its general purpose counterparts

    Similar works