158 research outputs found
A Survey of Techniques for Architecting TLBs
“Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used
in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently
and a TLB miss is extremely costly, prudent management of TLB is important for improving performance
and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and
managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and
distinctions. We believe that this paper will be useful for chip designers, computer architects and system
engineers
Software management techniques for translation lookaside buffers
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (leaves 67-70).by Kavita Bala.M.S
Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Mappings
Conventional virtual memory (VM) frameworks enable a virtual address to
flexibly map to any physical address. This flexibility necessitates large data
structures to store virtual-to-physical mappings, which leads to high address
translation latency and large translation-induced interference in the memory
hierarchy. On the other hand, restricting the address mapping so that a virtual
address can only map to a specific set of physical addresses can significantly
reduce address translation overheads by using compact and efficient translation
structures. However, restricting the address mapping flexibility across the
entire main memory severely limits data sharing across different processes and
increases data accesses to the swap space of the storage device, even in the
presence of free memory. We propose Utopia, a new hybrid virtual-to-physical
address mapping scheme that allows both flexible and restrictive hash-based
address mapping schemes to harmoniously co-exist in the system. The key idea of
Utopia is to manage physical memory using two types of physical memory
segments: restrictive and flexible segments. A restrictive segment uses a
restrictive, hash-based address mapping scheme that maps virtual addresses to
only a specific set of physical addresses and enables faster address
translation using compact translation structures. A flexible segment employs
the conventional fully-flexible address mapping scheme. By mapping data to a
restrictive segment, Utopia enables faster address translation with lower
translation-induced interference. Utopia improves performance by 24% in a
single-core system over the baseline system, whereas the best prior
state-of-the-art contiguity-aware translation scheme improves performance by
13%.Comment: To appear in 56th IEEE/ACM International Symposium on
Microarchitecture (MICRO), 202
Recommended from our members
Efficient fine-grained virtual memory
Virtual memory in modern computer systems provides a single abstraction of the memory hierarchy.
By hiding fragmentation and overlays of physical memory, virtual memory frees applications from managing physical memory and improves programmability.
However, virtual memory often introduces noticeable overhead.
State-of-the-art systems use a paged virtual memory that maps virtual addresses to physical addresses
in page granularity (typically 4 KiB ).This mapping is stored as a page table. Before accessing physically addressed memory, the page table is accessed
to translate virtual addresses to physical addresses. Research shows that the overhead of accessing the page table can even exceed the execution time for some important applications.
In addition, this fine-grained mapping changes the access patterns between virtual and physical address spaces, introducing difficulties to many architecture techniques, such as caches and prefecthers.
In this dissertation, I propose architecture mechanisms to reduce the overhead of accessing and managing fine-grained virtual memory without compromising existing benefits.
There are three main contributions in this dissertation.
First, I investigate the impact of address translation on cache. I examine the restriction of virtually indexed, physically tagged (VIPT) caches with fine-grained paging and conclude that this restriction may lead to sub-optimal cache designs.
I introduce a novel cache strategy, speculatively indexed, physically tagged (SIPT) to enable flexible cache indexing under fine-grained page mapping.
SIPT speculates on the value of a few more index bits (1 - 3 in our experiments) to access the cache speculatively before translation, and then verify that the physical tag matches after translation.
Utilizing the fact that a simple relation generally exists between virtual and physical addresses, because memory allocators often exhibit contiguity, I also propose low-cost mechanisms to predict and correct potential mis-speculations.
Next, I focus on reducing the overhead of address translation for fine-grained virtual memory. I propose a novel architecture mechanism, Embedded Page Translation Information (EMPTI),
to provide general fine-grained page translation information on top of coarse-grained virtual memory.
EMPTI does so by speculating that a virtual address is mapped to a pre-determined physical location and then verifying the translation with a very-low-cost access to metadata embedded with data.
Coarse-grained virtual memory mechanisms (e.g., segmentation) are used to suggest the pre-determined physical location for each virtual page.
Overall, EMPTI achieves the benefits of low overhead translation while keeping the flexibility and programmability of fine-grained paging.
Finally, I improve the efficiency of metadata caching based on the fact that memory mapping contiguity generally exists beyond a page boundary.
In state-of-the-art architectures, caches treat PTEs (page table entries) as regular data. Although this is simple and straightforward,
it fails to maximize the storage efficiency of metadata.
Each page in the contiguously mapped region costs a full 8-byte PTE. However, the delta between virtual addresses and physical addresses remain the same and most metadata are identical.
I propose a novel microarchitectural mechanism that expands the effective PTE storage in the last-level-cache (LLC) and reduces the number of page-walk accesses that miss the LLC.Electrical and Computer Engineerin
Simulation of Address Translation Techniques
As the memory footprints of modern compute workloads continue to grow[1], pressure on the
memory hierarchy increases and address translations play an increasingly important role in system
performance. Translation Lookaside Buffers (TLB) are a vital structure to the performance of
modern virtual memory systems. They reduce the need for slow and expensive page walks by
caching the most recent virtual-to-physical address translations. We analyze how well the cost
of the page walk can be approximated in a five level memory hierarchy, and how simple and
hypothetical optimizations are able to affect the memory system performance.
Initially we compare the performance of a realistic page walker to a fixed page walk penalty.
This allows for future work to presume a demonstrably reasonable constant value in experimenta-
tion, not relying on intuition and saving on the additional time and energy of a simulated page walk.
A suggested fixed value is put forward as well as an analysis of the variability across workloads
and any limitations.
Making use of this fixed page walk penalty, we also look at the effect of a simple TLB op-
timization - doubling the available resources. allows us to asses the affect of the TLB on the
memory system performance and discuss both what a future optimization may look like and what
performance can be both reasonably expected and hoped for.
We analyze one potential in-TLB optimization, CHiRP[2], which seeks a replacement policy
for the TLB more appropriate and optimized for the structure than least-recently-used (LRU). We
analyze the structure of the policy and also the results of the CHiRP work against our hypothetical
performance improvements. A strategy related to prefetching is also analyzed. ASAP[3] which
prefetches inside of and relevant only to a particular page walk is examined
- …