Introduction
Virtual memory, VM, is a fundamental abstraction of storage used by computer systems to support concurrent execution of processes. Processes can be protected from other processes execution and processes can view storage in a simplified, uniform manner.
Virtual memory defines a mapping function from one address space to some other address space. Traditionally, that mapping is a single translation from a virtual address, local to the process, to a physical address that directly accesses storage. The trends in computing point to changes in the utilization of the address space. Object-oriented systems, mapped files, shared objects, and distributed computing all increase the size of the address space used by a process, encourage more sharing and decrease the locality of the resulting virtual memory address stream. The translation structure's performance is influenced by these changes. Later measurements quantify the very different behavior of simple program and more complex operating system execution. Independent of the particular organization, all page tables are simply a data structure that is primarily designed for efficient retrieval of a translation using the virtual address as a search key. Searching is a large field and well researched field. with this structure allow portions of the hierarchy to be unallocated by using vrdidit y bits in the higher levels. Some architectures provide a short circuit approach that promotes leaf pages to a higher place in the hierarchy when the address space is sparsely used. Figure 1 shows an example table illustrating this mechanism. The root pointer is used to start the search.
Each index merges bits from the entry with more of the virtual address bits. The complete physical address is formed with the page offset bits in the virtnaf address and physicaJ page number in the leaf page.
The forward-mapped  table is generally a per-process table. Some control register holds a pointer to the first level of the For example, to map a 32Mbyte physicrd memory system with 4Kbyte pages, a HAT of 16K entries is used to index a SK entry IPT. Assume 32-bit physical addresses. 64-bit virtual addresses carI be nicely packed into a 16byte entry for a (8x16 Kbyte+16K*4byte)/4K = .6% overhead. Figure 3 shows the structure of this table.
Hashed Page Translation Load virtual address tag 1 from page directory wordl.
Compare faulting address with virtual address tag. If not equal load Next Pde index (wordO), goto step 3.
Load the virtual address tag2 from page directory entry word2.
Compare faulting address with virtual address tag. If not equal load Next Pde index (wordO) and goto step 3.
Load protection fields (word3), and check reference bit.
Insert address, rpn, and protection information into the hardware TLB. Return from interrupt. Load protection fields (wordl), and check reference bit.
Load the rpn.
Insert address, rpn, and protection information into TLB.
Retrim from interrupt. One or two four word entries are contained in one 32-byte cache line. The preceding data structure was simulated with several variations; two are described in detail: q 16-byte entries each containing one translation.
q 32-byte entries each containing two independent 16-byte entries which checked in parallel or serially for a match (a 2-way associative HPT).
Each of these was evaluated based on the hardware costs and 
Measurements
The HPT analysis suggests that it will perform uniformly better than the inverted The most common data sharing is by the instruction segment. Some data sharing occurs in the multi-user benchmarks.
Measurements of the originaJ IPT structure were not modeled using the two steps of tracing and simulation since the benefits for using an HPT over the IPT had already been demonstrated with prototype software h the lab. At the time this paper was written, resource constraints prevented rerunning the traces against just the IPT model. Instead, the IPT data is generated by using an equivalent sized HPT's first bucket cache hit rate as an approximation to the IPT's hash anchor From the graph, the hit rate into the front bucket is reasonably high. As expected, the larger the table the more likely the first entry holds the desired translation. A 2-way associate HPT achieves a higher hit rate then the 1-way. However as graph 4 demonstra~es, the increased hit rate is n~t enough to offset the extra cycles spent in searching the two entries in series. 
FWSMRGLOW Graph 6
Front bucket miss rate NASM-HW-HPTIX-Iw E ux-Hw-HpT2x-2w
u ASM-HW-HPT.25X-1 Wu ASM-Hw-HpT.5X-l W u UX-HW-HPT4X-1 w u iX-HW-HPT2x-l w EASM-HW-HPTIX-lW f?iJASM-HkV-HPT2X-lW Graph 7 Graph 8
