257 research outputs found
Predictable Shared Memory Resources for Multi-Core Real-Time Systems
A major challenge in multi-core real-time systems is the interference problem on the shared hardware components amongst cores. Examples of these shared components include buses, on-chip caches, and off-chip dynamic random access memories (DRAMs). The problem arises because different cores in the system interfere with each other, while competing to access the shared hardware components. It is a challenging problem for real-time systems because operations of one core affect the temporal behaviour of other cores, which complicates the timing analysis of the system. We address this problem by making the following contributions. 1) For shared buses, we propose CArb, a predictable and criticality-aware arbiter, which provides guaranteed and differential service to tasks based on their requirements. In addition, we utilize CArb to mitigate overheads resulting from system switching among different modes. 2) For the cache hierarchy, we address the problem of maintaining cache coherence in multi-core real-time systems by modifying current coherence protocols such that data sharing is viable for real-time systems in a manner amenable for timing analysis. The proposed solution provides performance improvements, does not impose any scheduling restrictions, and does not require any source-code modifications. 3) At the shared DRAM level, we propose PMC, a programmable memory controller that provides latency guarantees for running tasks upon accessing the off-chip DRAM, while assigning differential memory services to tasks based on their bandwidth and latency requirements.
In addition to PMC, we conduct a latency-based analysis on DRAM memory controllers (MCs). Our analysis provides both best-case and worst-case bounds on the latency that any request suffers upon accessing the DRAM. The analysis comprehensively covers all possible interactions of successive requests considering all possible DRAM states. Finally, we formally model request interrelations and DRAM command interactions. We use these models to develop an automated
validation framework along with benchmark suites to validate and evaluate PMC and any other
MC, which we release as an open-source tool
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
Architektur- und Leistungsanalyse eines Mehgenerationen-SDRAM-Controllers für gemischte Kritikalitätssysteme
Due to their high-density and low-cost, DDR SDRAM are the prevailing choice for implementing the main memory of a computer system.
Nevertheless, the aforementioned benefits come at the cost of a complex two-stage access protocol, which ultimately means that the time required to serve a memory request depends on the history of previous requests.
Otherly stated, DDR SDRAMs are a stateful resource.
The main goal of this dissertation is to design a controller that leverages the state of DDR SDRAMs in a mixed criticality environment.
More specifically, the controller should provide good average performance for best-effort requestors without compromising timing guarantees for critical requestors.
With that regard, this dissertation firstly identifies two challenges of growing relevance for the design of memory controllers for the mixed criticality domain.
The first challenge is the data bus turnaround time.
The second challenge is the rank-to-rank switching time and only affects multi-rank modules.
After pinpointing the two aforementioned challenges, this dissertation proposes a SDRAM controller to tackle them.
The proposed controller bundles read and write operations in their corresponding ranks, thus minimizing the number of data bus turnarounds and rank switching events.
As a consequence, the average performance of the controller is improved.
However, the bundling is carefully designed so that real-time guarantees for critical requestors can be extracted.
Moreover, as it will become clear, both the operation of the controller and the corresponding analysis of the temporal properties are described in terms of a generation-independent notation.
This is a desirable feature because different SDRAM generations have different architectural features and possibly, timing constraints.
Finally, an extensive comparison with the related work is performed.
Furthermore, trends in worst-case latency over DDR SDRAM from different speed bins and generations are presented and thoroughly discussed.Aufgrund ihrer hohen Dichte und geringen Kosten sind DDR SDRAM die vorherrschende Wahl für die Implementierung des Hauptspeichers eines Computersystems.
Die oben genannten Vorteile gehen jedoch zu Lasten eines komplexen zweistufigen Zugriffsprotokolls, was letztendlich bedeutet, dass die Zeit, die benötigt wird, um eine Speicheranforderung zu bedienen, von der Historie früherer Zugriffe abhängt.
Anders ausgedrückt, DDR SDRAM sind eine zustandsabhängige Ressource, was die Umsetzung gemischter Kritikalitäten weiter erschwert, da unterschiedliche Ebenen der Kritikalität widersprüchliche Bedürfnisse haben.
Das Hauptziel dieser Dissertation ist es, einen Controller zu entwickeln, der den Zustand der DDR-SDRAMs in einer gemischten Kritikalitätsumgebung nutzt.
Genauer gesagt, der Controller soll eine gute durchschnittliche Leistung für best-effort Zugriffe ermöglichen, ohne die Garantien für kritische Zugriffe zu gefährden.
In diesem Zusammenhang identifiziert diese Dissertation zunächst zwei Herausforderungen von wachsender Relevanz für das Design von Speichercontrollern für Systeme gemischter Kritikalität.
Die erste Herausforderung ist die notwendige Zeit zur Richtungsänderung des Datenbusses.
Die zweite Herausforderung ist die Rang-zu-Rang-Schaltzeit und betrifft nur Module mit mehreren Rängen.
Nach dem Aufzeigen der beiden oben genannten Herausforderungen, schlägt diese Dissertation einen SDRAM Controller vor, um sie anzugehen.
Der vorgeschlagene Controller bündelt Lese und Schreib Operationen in ihren entsprechenden Rängen, wodurch die Anzahl der Richtungsänderungen des Datenbusses und die Anzahl der Rangwechsel minimiert wird.
Dadurch wird die durchschnittliche Leistung des Controllers verbessert.
Die Bündelung ist so konzipiert, dass Echtzeit-Garantien für kritische Zugriffe abgeleitet werden können.
Darüber hinaus werden, wie sich zeigen wird, sowohl das Verhalten des Controllers als auch die entsprechende Analyse der zeitlichen Eigenschaften in Form einer generationsunabhängigen Notation beschrieben.
Dies ist ein wünschenswertes Merkmal, da verschiedene SDRAM Generationen unterschiedliche architektonische Merkmale und zeitliche Beschränkungen haben.
Abschließend wird ein ausführlicher Vergleich mit inhaltlich verwandten Arbeiten durchgeführt.
Außerdem werden Trends in der Worst-Case-Latenz von DDR SDRAM aus verschiedenen Geschwindigkeitsklassen und Generationen vorgestellt und ausführlich diskutiert
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Poor time predictability of multicore processors has been a long-standing challenge in the real-time systems community. In this paper, we make a case that a fundamental problem that prevents efficient and predictable real-time computing on multicore is the lack of a proper memory abstraction to express memory criticality, which cuts across various layers of the system: the application, OS, and hardware. We, therefore, propose a new holistic resource management approach driven by a new memory abstraction, which we call Deterministic Memory. The key characteristic of deterministic memory is that the platform-the OS and hardware-guarantees small and tightly bounded worst-case memory access timing. In contrast, we call the conventional memory abstraction as best-effort memory in which only highly pessimistic worst-case bounds can be achieved. We propose to utilize both abstractions to achieve high time predictability but without significantly sacrificing performance. We present deterministic memory-aware OS and architecture designs, including OS-level page allocator, hardware-level cache, and DRAM controller designs. We implement the proposed OS and architecture extensions on Linux and gem5 simulator. Our evaluation results, using a set of synthetic and real-world benchmarks, demonstrate the feasibility and effectiveness of our approach
Technical Report: Designing High-Performance Real-Time SDRAM Controllers for Many-Core Systems (Revision 1.0)
Open-row real-time SDRAM controllers have been recently pinpointed as an interesting approach to
effectively exploit wide SDRAM data buses often present in many-core platforms. However, their evaluation
has mostly targeted specific DDR-generations. This is problematic, as every new DDR-generation introduces
new architectural features and/or timing constraints. In this article, we address such challenge. More specifically,
we propose a multi-generation open-row real-time SDRAM controller architecture. Furthermore, we examine
the trends in terms of worst-case latency over modules from DDR2, DDR3 and DDR4 SDRAMs
DuoMC: Tight DRAM Latency Bounds with Shared Banks and Near-COTS Performance
© {Reza Mirosanlou, Mohamed Hassan, and Rodolfo Pellizzoni | ACM} {2021}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in {In The International Symposium on Memory Systems }, https://doi.org/10.1145/3488423.3488431DRAM memory controllers (MCs) in COTS systems are designed primarily for average performance, offering no worst-case guarantees, while real-time MCs provide timing guarantees at the cost of a significant average performance degradation. For this reason, hardware vendors have been reluctant to integrate real-time solutions in high-performance platforms. In this paper, we overcome this performance-predictability trade-off by introducing DuoMC, a novel memory controller that promotes to augment COTS MCs with a real-time scheduler and run-time monitoring to provide predictability guarantees. Leveraging the fact that the resource is barely overloaded, DuoMC allows the system to enjoy the high performance of the conventional MC most of the time, while switching to the real-time scheduler only when timing guarantees risk being violated, which rarely occurs. In addition, unlike most existing real-time MCs, DuoMC enables the utilization of both private and shared DRAM banks among cores to facilitate communication among tasks. We evaluate DuoMC using a cycle-accurate multi-core simulator. Results show that DuoMC can provide better or comparable latency guarantees than state-of-the-art real-time MCs with limited performance loss (only 8% in the worst scenario) compared to the COTS MC
- …