Search CORE

89 research outputs found

Scalability of microkernel-based systems

Author: Uhlig Volkmar
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

Author: Bartolini Davide-Basilio
Bera Rahul
Bostanci F. Nisa
Kanellopoulos Konstantinos
Kumar Rakesh
Mutlu Onur
Nam Hong Chul
Sadrosadati Mohammad
Publication venue
Publication date: 13/10/2023
Field of study

Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) large software-managed TLBs. Unfortunately, both solutions have significant drawbacks: increased access latency, power and area (for hardware TLBs), and costly memory accesses, the need for large contiguous memory blocks, and complex OS modifications (for software-managed TLBs). We present Victima, a new software-transparent mechanism that drastically increases the translation reach of the processor by leveraging the underutilized resources of the cache hierarchy. The key idea of Victima is to repurpose L2 cache blocks to store clusters of TLB entries, thereby providing an additional low-latency and high-capacity component that backs up the last-level TLB and thus reduces PTWs. Victima has two main components. First, a PTW cost predictor (PTW-CP) identifies costly-to-translate addresses based on the frequency and cost of the PTWs they lead to. Second, a TLB-aware cache replacement policy prioritizes keeping TLB entries in the cache hierarchy by considering (i) the translation pressure (e.g., last-level TLB miss rate) and (ii) the reuse characteristics of the TLB entries. Our evaluation results show that in native (virtualized) execution environments Victima improves average end-to-end application performance by 7.4% (28.7%) over the baseline four-level radix-tree-based page table design and by 6.2% (20.1%) over a state-of-the-art software-managed TLB, across 11 diverse data-intensive workloads. Victima (i) is effective in both native and virtualized environments, (ii) is completely transparent to application and system software, and (iii) incurs very small area and power overheads on a modern high-end CPU.Comment: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 202

arXiv.org e-Print Archive

Design of Swap Space in Virtual Memory Management System

Author: Amandeep Singh
Gulshan Goyal
Jasneet Kaur
Ȧ
Publication venue
Publication date: 02/04/2020
Field of study

Abstrac

CiteSeerX

A Survey of Techniques for Architecting TLBs

Author: Mittal Sparsh
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

“Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects and system engineers

Research Archive of Indian Institute of Technology Hyderabad

Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Mappings

Author: Ausavarungnirun Rachata
Bera Rahul
Bostanci Nisa
Firtina Can
Hajinazar Nastaran
Kanellopoulos Konstantinos
Kumar Rakesh
Mutlu Onur
Sadrosadati Mohammad
Stojiljkovic Kosta
Vijaykumar Nandita
Publication venue
Publication date: 06/10/2023
Field of study

Conventional virtual memory (VM) frameworks enable a virtual address to flexibly map to any physical address. This flexibility necessitates large data structures to store virtual-to-physical mappings, which leads to high address translation latency and large translation-induced interference in the memory hierarchy. On the other hand, restricting the address mapping so that a virtual address can only map to a specific set of physical addresses can significantly reduce address translation overheads by using compact and efficient translation structures. However, restricting the address mapping flexibility across the entire main memory severely limits data sharing across different processes and increases data accesses to the swap space of the storage device, even in the presence of free memory. We propose Utopia, a new hybrid virtual-to-physical address mapping scheme that allows both flexible and restrictive hash-based address mapping schemes to harmoniously co-exist in the system. The key idea of Utopia is to manage physical memory using two types of physical memory segments: restrictive and flexible segments. A restrictive segment uses a restrictive, hash-based address mapping scheme that maps virtual addresses to only a specific set of physical addresses and enables faster address translation using compact translation structures. A flexible segment employs the conventional fully-flexible address mapping scheme. By mapping data to a restrictive segment, Utopia enables faster address translation with lower translation-induced interference. Utopia improves performance by 24% in a single-core system over the baseline system, whereas the best prior state-of-the-art contiguity-aware translation scheme improves performance by 13%.Comment: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 202

arXiv.org e-Print Archive

Operating System Kernels on Multi-core Architectures

Author: Almatary Hesham
Publication venue: University of York
Publication date: 15/01/2016
Field of study

Operating System (OS) kernels have been under research and development for decades, mainly assuming single processor and distributed hardware systems. With the recent rise of multi-core chips that may incorporate a network on chip (NoC), new challenges have appeared that were not considered before. Given that a complete multi-core system that works on a single system on chip (SoC) is now the normal case, different cores on a single SoC may share other physical resources and data. This new sharing scheme on a SoC affects crucial aspects of an overall system like correctness, performance, predictability, scalability and security. Both hardware and OSs to flexibly cooperate in order to provide solutions for such challenges. SoC mimics the internet somehow now, with different cores acting as computer nodes, and the network medium is given in an advanced digital fabrics like buses or NoCs, that are a current research area. However, OSs are still assuming some (hardware) features like single physical memory and memory sharing for inter-process communication, page-based protection, cache operations, even when evolving from uniprocessor to multi-core processors. Such features not only may degrade performance and other system aspects, but also some of them make no sense for a multi-core SoC, and introduce some barriers and limitations. While new OS research is considering different kernel designs to cope up with multi-core systems, they are still limited by the current commercial hardware architectures. The objective of this thesis is to assess different kernel designs and implementations on multi-core hardware architectures. Part of the contributions of the thesis is porting RTEMS (RTOS) and seL4 microkernel to Epiphany and RISC-V hardware architectures respectively, trading-off the design and implementation decisions. This hands-on experience gave a better understanding of the real-world challenges regarding kernel designs and implementations

White Rose E-theses Online

Software-Managed Address Translation

Author: Jacob Bruce
Mudge Trevor
Publication venue
Publication date: 01/01/1997
Field of study

In this paper we explore software-managed address translation. The purpose of the study is to specify the memory management design for a high clock-rate PowerPC implementation in which a simple design is a prerequisite for a fast clock and a short design cycle. We show that software-managed address translation is just as efficient as hardware- managed address translation, and it is much more flexible. Operating systems such as OSF/1 and Mach charge between 0.10 and 0.28 cycles per instruction (CPI) for address translation using dedicated memory-management hardware. Software-managed translation requires 0.05 CPI. Mechanisms to support such features as shared memory, superpages, sub-page protection, and sparse address spaces can be defined completely in software, allowing much more flexibility than in hardware-defined mechanisms

CiteSeerX

Digital Repository at the University of Maryland

Recommended from our members

A Secure and Formally Verified Commodity Multiprocessor Hypervisor

Author: Li Shih-Wei
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2021
Field of study

Commodity hypervisors are widely deployed to support virtual machines on multiprocessor server hardware. Modern hypervisors are complex and often integrated with an operating system kernel, posing a significant security risk as writing large, multiprocessor systems software is error-prone. Attackers that successfully exploit hypervisor vulnerabilities may gain unfettered access to virtual machine data and compromise the confidentiality and integrity of virtual machine data. Theoretically, formal verification offers a solution to this problem, by proving that the hypervisor implementation contains no vulnerabilities and protects virtual machine data under all circumstances. However, it remains unknown how one might feasibly verify the entire codebase of a complex, multiprocessor commodity system. My thesis is that modest changes to a commodity system can reduce the required proof effort such that it becomes possible to verify the security properties of the entire system. This dissertation introduces microverification, a new approach for formally verifying the security properties of commodity systems. Microverification reduces the proof effort for a commodity system by retrofitting the system into a small core and a set of untrusted services, thus making it possible to reason about properties of the entire system by verifying the core alone. To verify the multiprocessor hypervisor core, we introduce security-preserving layers to modularize the proof without hiding information leakage so we can prove each layer of the implementation refines its specification, and the top layer specification is refined by all layers of the core implementation. To verify commodity hypervisor features that require dynamically changing information flow, we incorporate data oracles to mask intentional information flow. We can then prove noninterference at the top layer specification and guarantee the resulting security properties hold for the entire hypervisor implementation. Using microverification, we retrofitted the Linux KVM hypervisor with only modest modifications to its codebase. Using Coq, we proved that the hypervisor protects the confidentiality and integrity of VM data, including correctly managing tagged TLBs, shared multi-level page tables, and caches. Our work is the first machine-checked security proof for a commodity multiprocessor hypervisor. Experimental results with real application workloads demonstrate that verified KVM retains KVM’s functionality and performance

Columbia University Academic Commons

In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)

Author: Jacob Bruce
Jaleel Aamer
Publication venue
Publication date: 01/05/2006
Field of study

The effects of the general-purpose precise interrupt mechanisms in use for the past few decades have received very little attention. When modern out-of-order processors handle interrupts precisely, they typically begin by flushing the pipeline to make the CPU available to execute handler instructions. In doing so, the CPU ends up flushing many instructions that have been brought in to the reorder buffer. In particular, these instructions may have reached a very deep stage in the pipeline—representing significant work that is wasted. In addition, an overhead of several cycles and wastage of energy (per exception detected) can be expected in refetching and reexecuting the instructions flushed. This paper concentrates on improving the performance of precisely handling software managed translation look-aside buffer (TLB) interrupts, one of the most frequently occurring interrupts. The paper presents a novel method of in-lining the interrupt handler within the reorder buffer. Since the first level interrupt-handlers of TLBs are usually small, they could potentially fit in the reorder buffer along with the user-level code already there. In doing so, the instructions that would otherwise be flushed from the pipe need not be refetched and reexecuted. Additionally, it allows for instructions independent of the exceptional instruction to continue to execute in parallel with the handler code. By in-lining the TLB interrupt handler, this provides lock-up free TLBs. This paper proposes the prepend and append schemes of in-lining the interrupt handler into the available reorder buffer space. The two schemes are implemented on a performance model of the Alpha 21264 processor built by Alpha designers at the Palo Alto Design Center (PADC), California. We compare the overhead and performance impact of handling TLB interrupts by the traditional scheme, the append in-lined scheme, and the prepend in-lined scheme. For small, medium, and large memory footprints, the overhead is quantified by comparing the number and pipeline state of instructions flushed, the energy savings, and the performance improvements. We find that lock-up free TLBs reduce the overhead of refetching and reexecuting the instructions flushed by 30-95 percent, reduce the execution time by 5-25 percent, and also reduce the energy wasted by 30-90 percent

Digital Repository at the University of Maryland