61 research outputs found

    A Survey of Techniques for Architecting TLBs

    Get PDF
    “Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects and system engineers

    Address spaces and virtual memory : specification, implementation, and correctness

    Get PDF
    In modern operating systems tasks operate concurrently on a logical memory. Address spaces control access rights to and the sharing of that memory. They are associated with tasks and manipulated dynamically by memory management operations of the operating system. For cost reasons, logical memory and address spaces are not implemented directly but simulated. The contents of the logical memory are placed in two different memories, the main and the swap memory. Tasks access their address space by using an architecturally defined address translation mechanism, which is implemented by the memory management unit (MMU) and optimized with a translation look-aside buffer (TLB). This mechanism either redirects a memory access to some main memory location or generates a page fault exception resulting in a call to the page fault handler, a low-level operating system procedure. This construction is correct iff it is transparent to the tasks, so that they behave as if they would operate directly on the logical memory under control of their address spaces. We call the formalization of this correctness criterion a virtual memory simulation theorem. In our thesis we formulate and prove such a theorem for an abstract multiprocessor. We apply the theorem to a concrete implementation, a VAMP [BJK+03] with a singlelevel address translation mechanism and an exemplary page fault handler. We show how to extend the architecture and proofs to support TLBs, multi-level translation, and multiprocessing.In modernen Betriebssystemen operieren Programme nebenläufig auf einem logischen Speicher. Der Zugriff auf diesen Speicher und seine gemeinsame Nutzung wird durch Adressräume geregelt. Diese sind den Programmen zugeordnet und können durch Speicherverwaltungsoperationen des Betriebssystems dynamisch manipuliert werden. Logischer Speicher und Adressräume werden aus Kostengründen nicht direkt implementiert sondern simuliert. Hierbei verteilen sich die Inhalte des logischen Speichers auf zwei verschiedene Speicher, den Haupt- und den Auslagerungsspeicher. Zugriff auf ihren Adressraum wird den Programmen nur unter Nutzung eines durch die Rechnerarchitektur definierten Adressübersetzungsmechanismus gewährt, der durch die Memory Management Unit (MMU) und den Translation Look-Aside Buffer (TLB) implementiert wird. Dieser Mechanismus lenkt einen Zugriff entweder auf eine Hauptspeicheradresse um, oder er erzeugt einen Seitenfehler, der den Aufruf der Seitenfehlerbehandlung, eines hardware-nahen Betriebssystemteils, einleitet. Diese Konstruktion ist korrekt, wenn sie für die Programme transparent ist, das heißt, wenn diese sich mit ihr so verhalten, als griffen sie direkt auf den logischen Speicher unter Kontrolle ihrer Adressräume zu. Die Formalisierung dieser Korrektheitsaussage heißt Simulationssatz für virtuellen Speicher. In der vorliegenden Arbeit formulieren und beweisen wir einen derartigen Satz für ein abstraktes Mehrprozessorsystem. Wir wenden ihn auf eine konkrete Implementierung an, den VAMP [BJK+03] mit einem einstufigen Adressübersetzungsmechanismus und einer exemplarischen Seitenfehlerbehandlung. Wir zeigen, wie Rechnerarchitektur und Korrektheitsbeweise für die Unterstützung von TLBs, mehrstufiger Übersetzung und Mehrprozessorbetrieb erweitert werden können

    Principled Elimination of Microarchitectural Timing Channels through Operating-System Enforced Time Protection

    Full text link
    Microarchitectural timing channels exploit resource contentions on a shared hardware platform to cause information leakage through timing variance. These channels threaten system security by providing unauthorised information flow in violation of the system’s security policy. Present operating systems lack the means for systematic prevention of such channels. To address this problem, we propose time protection as an operating system (OS) abstraction, which provides mandatory temporal isolation analogous to the spatial isolation provided by the established memory protection abstraction. In order to fully understand microarchitectural timing channels, we first study all published microarchitectural timing attacks, their countermeasures and analyse the underlying causes. Then we define two application scenarios, a confinement scenario and a cloud scenario, which between them represent a large class of security-critical use cases, and aim to develop a solution that supports both. Our study identifies competition for limited hardware resources as the underlying cause for microarchitectural timing channels. From this we derive the requirement that proper isolation requires that all shared resources must be partitioned, either spatially or temporally (time-shared). We then analyse a number of recent processors across two instruction-set architectures (ISAs), x86 and Arm, for their support for such partitioning. We discover that all examined processors exhibit hardware state that cannot be partitioned by architected means, meaning that they all have uncloseable channels.We define the requirements hardware must satisfy for timing-channel prevention, and propose an augmented ISA as a new, security-oriented hardware-software contract. Assuming conforming hardware, we then define the requirements that OS-provided time protection must satisfy. We propose a concrete design of time protection, consisting of a set of policy-free mechanisms, and present an implementation in the seL4 microkernel. We evaluate the efficacy and efficiency of the implementation, and show that it is highly effective at closing timing channels, to the degree supported by the underlying hardware. We also find that the performance overheads are small to negligible. We can conclude that principled prevention of timing channels is possible though mandatory, black-box enforcement by the OS, subject to hardware manufacturers providing mechanisms for scrubbing all shared microarchitectural state
    corecore