Search CORE

158 research outputs found

A Survey of Techniques for Architecting TLBs

Author: Mittal Sparsh
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

“Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects and system engineers

Research Archive of Indian Institute of Technology Hyderabad

Software management techniques for translation lookaside buffers

Author: Bala Kavita
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (leaves 67-70).by Kavita Bala.M.S

DSpace@MIT

Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Mappings

Author: Ausavarungnirun Rachata
Bera Rahul
Bostanci Nisa
Firtina Can
Hajinazar Nastaran
Kanellopoulos Konstantinos
Kumar Rakesh
Mutlu Onur
Sadrosadati Mohammad
Stojiljkovic Kosta
Vijaykumar Nandita
Publication venue
Publication date: 06/10/2023
Field of study

Conventional virtual memory (VM) frameworks enable a virtual address to flexibly map to any physical address. This flexibility necessitates large data structures to store virtual-to-physical mappings, which leads to high address translation latency and large translation-induced interference in the memory hierarchy. On the other hand, restricting the address mapping so that a virtual address can only map to a specific set of physical addresses can significantly reduce address translation overheads by using compact and efficient translation structures. However, restricting the address mapping flexibility across the entire main memory severely limits data sharing across different processes and increases data accesses to the swap space of the storage device, even in the presence of free memory. We propose Utopia, a new hybrid virtual-to-physical address mapping scheme that allows both flexible and restrictive hash-based address mapping schemes to harmoniously co-exist in the system. The key idea of Utopia is to manage physical memory using two types of physical memory segments: restrictive and flexible segments. A restrictive segment uses a restrictive, hash-based address mapping scheme that maps virtual addresses to only a specific set of physical addresses and enables faster address translation using compact translation structures. A flexible segment employs the conventional fully-flexible address mapping scheme. By mapping data to a restrictive segment, Utopia enables faster address translation with lower translation-induced interference. Utopia improves performance by 24% in a single-core system over the baseline system, whereas the best prior state-of-the-art contiguity-aware translation scheme improves performance by 13%.Comment: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 202

arXiv.org e-Print Archive

Recommended from our members

Efficient fine-grained virtual memory

Author: Zheng Tianhao, Ph. D.
Publication venue
Publication date: 20/08/2018
Field of study

Virtual memory in modern computer systems provides a single abstraction of the memory hierarchy. By hiding fragmentation and overlays of physical memory, virtual memory frees applications from managing physical memory and improves programmability. However, virtual memory often introduces noticeable overhead. State-of-the-art systems use a paged virtual memory that maps virtual addresses to physical addresses in page granularity (typically 4 KiB ).This mapping is stored as a page table. Before accessing physically addressed memory, the page table is accessed to translate virtual addresses to physical addresses. Research shows that the overhead of accessing the page table can even exceed the execution time for some important applications. In addition, this fine-grained mapping changes the access patterns between virtual and physical address spaces, introducing difficulties to many architecture techniques, such as caches and prefecthers. In this dissertation, I propose architecture mechanisms to reduce the overhead of accessing and managing fine-grained virtual memory without compromising existing benefits. There are three main contributions in this dissertation. First, I investigate the impact of address translation on cache. I examine the restriction of virtually indexed, physically tagged (VIPT) caches with fine-grained paging and conclude that this restriction may lead to sub-optimal cache designs. I introduce a novel cache strategy, speculatively indexed, physically tagged (SIPT) to enable flexible cache indexing under fine-grained page mapping. SIPT speculates on the value of a few more index bits (1 - 3 in our experiments) to access the cache speculatively before translation, and then verify that the physical tag matches after translation. Utilizing the fact that a simple relation generally exists between virtual and physical addresses, because memory allocators often exhibit contiguity, I also propose low-cost mechanisms to predict and correct potential mis-speculations. Next, I focus on reducing the overhead of address translation for fine-grained virtual memory. I propose a novel architecture mechanism, Embedded Page Translation Information (EMPTI), to provide general fine-grained page translation information on top of coarse-grained virtual memory. EMPTI does so by speculating that a virtual address is mapped to a pre-determined physical location and then verifying the translation with a very-low-cost access to metadata embedded with data. Coarse-grained virtual memory mechanisms (e.g., segmentation) are used to suggest the pre-determined physical location for each virtual page. Overall, EMPTI achieves the benefits of low overhead translation while keeping the flexibility and programmability of fine-grained paging. Finally, I improve the efficiency of metadata caching based on the fact that memory mapping contiguity generally exists beyond a page boundary. In state-of-the-art architectures, caches treat PTEs (page table entries) as regular data. Although this is simple and straightforward, it fails to maximize the storage efficiency of metadata. Each page in the contiguously mapped region costs a full 8-byte PTE. However, the delta between virtual addresses and physical addresses remain the same and most metadata are identical. I propose a novel microarchitectural mechanism that expands the effective PTE storage in the last-level-cache (LLC) and reduces the number of page-walk accesses that miss the LLC.Electrical and Computer Engineerin

Texas ScholarWorks

Simulation of Address Translation Techniques

Author: Coman James
Publication venue
Publication date: 24/01/2022
Field of study

As the memory footprints of modern compute workloads continue to grow[1], pressure on the memory hierarchy increases and address translations play an increasingly important role in system performance. Translation Lookaside Buffers (TLB) are a vital structure to the performance of modern virtual memory systems. They reduce the need for slow and expensive page walks by caching the most recent virtual-to-physical address translations. We analyze how well the cost of the page walk can be approximated in a five level memory hierarchy, and how simple and hypothetical optimizations are able to affect the memory system performance. Initially we compare the performance of a realistic page walker to a fixed page walk penalty. This allows for future work to presume a demonstrably reasonable constant value in experimenta- tion, not relying on intuition and saving on the additional time and energy of a simulated page walk. A suggested fixed value is put forward as well as an analysis of the variability across workloads and any limitations. Making use of this fixed page walk penalty, we also look at the effect of a simple TLB op- timization - doubling the available resources. allows us to asses the affect of the TLB on the memory system performance and discuss both what a future optimization may look like and what performance can be both reasonably expected and hoped for. We analyze one potential in-TLB optimization, CHiRP[2], which seeks a replacement policy for the TLB more appropriate and optimized for the structure than least-recently-used (LRU). We analyze the structure of the policy and also the results of the CHiRP work against our hypothetical performance improvements. A strategy related to prefetching is also analyzed. ASAP[3] which prefetches inside of and relevant only to a particular page walk is examined

Texas A&M Repository

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Author: D. Lenoski
D. Wentzlaff
Dimitrios S. Nikolopoulos
H. Shan
I. Schoinas
J. Leverich
J.A. Kahle
J.M. Mellor-Crummey
K. Gharachorloo
M. Wen
M.M.K. Martin
Manolis Katevenis
Michail Zampetakis
P.S. Magnusson
S.L. Scott
S.P. Amarasinghe
S.W. Keckler
Stamatis Kavadias
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Queen's University Belfast Research Portal

Crossref

Springer - Publisher Connector