2 research outputs found
Mosaic: An Application-Transparent Hardware-Software Cooperative Memory Manager for GPUs
Modern GPUs face a trade-off on how the page size used for memory management
affects address translation and demand paging. Support for multiple page sizes
can help relax the page size trade-off so that address translation and demand
paging optimizations work together synergistically. However, existing page
coalescing and splintering policies require costly base page migrations that
undermine the benefits multiple page sizes provide. In this paper, we observe
that GPGPU applications present an opportunity to support multiple page sizes
without costly data migration, as the applications perform most of their memory
allocation en masse (i.e., they allocate a large number of base pages at once).
We show that this en masse allocation allows us to create intelligent memory
allocation policies which ensure that base pages that are contiguous in virtual
memory are allocated to contiguous physical memory pages. As a result,
coalescing and splintering operations no longer need to migrate base pages.
We introduce Mosaic, a GPU memory manager that provides
application-transparent support for multiple page sizes. Mosaic uses base pages
to transfer data over the system I/O bus, and allocates physical memory in a
way that (1) preserves base page contiguity and (2) ensures that a large page
frame contains pages from only a single memory protection domain. This
mechanism allows the TLB to use large pages, reducing address translation
overhead. During data transfer, this mechanism enables the GPU to transfer only
the base pages that are needed by the application over the system I/O bus,
keeping demand paging overhead low
Techniques for Shared Resource Management in Systems with Throughput Processors
The continued growth of the computational capability of throughput processors
has made throughput processors the platform of choice for a wide variety of
high performance computing applications. Graphics Processing Units (GPUs) are a
prime example of throughput processors that can deliver high performance for
applications ranging from typical graphics applications to general-purpose data
parallel (GPGPU) applications. However, this success has been accompanied by
new performance bottlenecks throughout the memory hierarchy of GPU-based
systems. We identify and eliminate performance bottlenecks caused by major
sources of interference throughout the memory hierarchy.
We introduce changes to the memory hierarchy for systems with GPUs that allow
the memory hierarchy to be aware of both CPU and GPU applications'
characteristics. We introduce mechanisms to dynamically analyze different
applications' characteristics and propose four major changes throughout the
memory hierarchy. We propose changes to the cache management and memory
scheduling mechanisms to mitigate intra-application interference in GPGPU
applications. We propose changes to the memory controller design and its
scheduling policy to mitigate inter-application interference in heterogeneous
CPU-GPU systems. We redesign the MMU and the memory hierarchy in GPUs to be
aware of ddress-translation data in order to mitigate the inter-address-space
interference. We introduce a hardware-software cooperative technique that
modifies the memory allocation policy to enable large page support in order to
further reduce the inter-address-space interference at the shared Translation
Lookaside Buffer (TLB). Our evaluations show that the GPU-aware cache and
memory management techniques proposed in this dissertation are effective at
mitigating the interference caused by GPUs on current and future GPU-based
systems.Comment: PhD thesi