Search CORE

108 research outputs found

Rank-switching, Open-row DRAM Controller for Mixed-Critical Real-Time Systems

Author: Krishnapillai Yogen
Publication venue: 'University of Waterloo'
Publication date: 14/01/2015
Field of study

In this thesis, we present a rank-switching open-row DRAM controller for mixed critical real time systems. This memory controller is optimized for multi-requestor and multi-rank memory systems. The key to improved performance is an innovative rank-switching mechanism which hides the latency of write to read transitions in DRAM devices without requiring unpredictable request reordering. We further employ open-row policy to take advantage of the data caching mechanism (row buffering) in each device. We choose the bank privatization scheme where each requestor is assigned its own private bank or set of banks. This private bank mapping guarantees that each requestor has its own row buffers and cannot be interfered by other requestors. The proposed memory controller design allows maximum of thirty two requestors at a time targeting either two or four ranks. This controller provides complete timing isolation between critical and non-critical applications and allows for compositional timing analysis over number of requestors and memory ranks in the system. We designed both the front end logic for the command generation and back end logic for the DRAM timing constraint check and arbitration utilizing the rank switching techniques. The complete design is implemented and synthesized using Verilog RTL and finally, we evaluated performance using various benchmarks. Our proposed memory controller offers significantly lower worst case latency bounds for critical real-time applications and supports average throughput for non-critical real-time applications compared to existing real time memory controllers in the literature

University of Waterloo's Institutional Repository

A Composable Worst Case Latency Analysis for Multi-Rank DRAM Devices under Open Row Policy

Author: Guo Danlu
Pellizzoni Rodolfo
Wu Zheng Pei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/04/2017
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s11241-016-9253-4As multi-core systems are becoming more popular in real-time embedded systems, strict timing requirements for accessing shared resources must be met. In particular, a detailed latency analysis for Double Data Rate Dynamic RAM (DDR DRAM) is highly desirable. Several researchers have proposed predictable memory controllers to provide guaranteed memory access latency. However, the performance of such controllers sharply decreases as DDR devices become faster and the width of memory buses is increased. High-performance Commercial-Off-The-Shelf (COTS) memory controllers in general-purpose systems employ open row policy to improve average case access latencies and memory throughput, but the use of such policy is not compatible with existing real-time controllers. In this article, we present a new memory controller design together with a novel, composable worst case analysis for DDR DRAM that provides improved latency bounds compared to existing works by explicitly modeling the DRAM state. In particular, our approach scales better with increasing memory speed by predictably taking advantage of shorter latency for access to open DRAM rows. Furthermore, it can be applied to multi-rank devices, which allow for increased access parallelism. We evaluate our approach based on worst case analysis bounds and simulation results, using both synthetic tasks and a set of realistic benchmarks. In particular, benchmark evaluations show up to 45% improvement in worst case task execution time compared to a competing predictable memory controller for a system with 16 requestors and one rank.NSERC DG || 402369-2011 CMC Microsystem

University of Waterloo's Institutional Repository

Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems

Author: Yun Heechul
Publication venue
Publication date: 25/07/2014
Field of study

In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay experienced by running tasks on the platform. In this paper, we model a modern COTS multicore system which has a nonblocking last-level cache (LLC) and a DRAM controller that prioritizes reads over writes. To minimize interference, we focus on LLC and DRAM bank partitioned systems. Based on the model, we propose an analysis that computes a safe upper bound for the worst-case memory interference delay. We validated our analysis on a real COTS multicore platform with a set of carefully designed synthetic benchmarks as well as SPEC2006 benchmarks. Evaluation results show that our analysis is more accurately capture the worst-case memory interference delay and provides safer upper bounds compared to a recently proposed analysis which significantly under-estimate the delay.Comment: Technical Repor

arXiv.org e-Print Archive

CiteSeerX

Adaptive Dual-Mode Arbitration for High-Performance Real-Time Embedded Systems

Author: Mirosanlou Reza
Publication venue: 'University of Waterloo'
Publication date: 17/12/2021
Field of study

Multi-core platforms can deliver substantial computational power together with minimum costs, compact size, weight, and power usage. However, multi-core architectures are shaking the very foundation of modern real-time systems, i.e. deriving the Worst-Case Execution Time (WCET) of the tasks. Modern embedded systems such as those deployed in the automotive and avionic fields face two difficult-to-resolve conflicting requirements due to the interference problem on the shared hardware components amongst cores: delivering high average-case performance and providing tight WCET. This challenge exists in different shared hardware resources including on-chip shared cache, hardware prefetchers, buses, and memory controller. The problem is mainly because various cores in the system interfere with each other while competing to access the aforementioned hardware components. While dedicated real-time controllers provide timing guarantees, they do so at the cost of significantly degrading system performance. This dissertation overcomes this trade-off by introducing Duetto, a general hardware resource management paradigm that pairs a real-time arbiter with a high-performance arbiter and a latency estimator module. Based on the observation that the resource is rarely overloaded, Duetto executes the high-performance arbiter most of the time, switching to the real-time arbiter only in the rare cases when the latency estimator deems that timing guarantees risk being violated. In this thesis, the Duetto paradigm is realized for different shared hardware resources. In the first part, I demonstrate Duetto on the case study of a multi-bank on-chip memory and discuss the foundation of the methodology. The methodology is concerned about designing the real-time arbiter in such a way that it is compatible with Duetto, deriving latency analysis, and designing the latency estimator module. In the second part, this thesis addresses the trade-off between maintaining cache coherence in multi-core real-time systems and improving average-case performance by proposing a novel coherency arbiter infrastructure and employing it in the context of Duetto. This is achieved by precisely engineering the multi-core hardware architecture and its underlying interconnect infrastructure such that data sharing is feasible for real-time systems in a manner amenable for timing analysis. The proposed solution provides near-to Commercial-Off-The-Shelf (COTS) performance and does not impose any coherency protocol modifications. The third part of this dissertation proposes DuoMC by applying Duetto to off-chip Memory Controller (MC) which is crucial since Dynamic Random-Access Memory (DRAM) main memory is one of the most complex shared resources in multi-core architectures and it is one of the critical bottlenecks both from latency as well as performance perspectives. As part of the MC evaluation, we release MCsim, an open-source, cycle-accurate simulator for memory controllers

University of Waterloo's Institutional Repository

DuoMC: Tight DRAM Latency Bounds with Shared Banks and Near-COTS Performance

Author: Hassan Mohamed
Mirosanlou Reza
Pellizzoni Rodolfo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/09/2021
Field of study

© {Reza Mirosanlou, Mohamed Hassan, and Rodolfo Pellizzoni | ACM} {2021}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in {In The International Symposium on Memory Systems }, https://doi.org/10.1145/3488423.3488431DRAM memory controllers (MCs) in COTS systems are designed primarily for average performance, offering no worst-case guarantees, while real-time MCs provide timing guarantees at the cost of a significant average performance degradation. For this reason, hardware vendors have been reluctant to integrate real-time solutions in high-performance platforms. In this paper, we overcome this performance-predictability trade-off by introducing DuoMC, a novel memory controller that promotes to augment COTS MCs with a real-time scheduler and run-time monitoring to provide predictability guarantees. Leveraging the fact that the resource is barely overloaded, DuoMC allows the system to enjoy the high performance of the conventional MC most of the time, while switching to the real-time scheduler only when timing guarantees risk being violated, which rarely occurs. In addition, unlike most existing real-time MCs, DuoMC enables the utilization of both private and shared DRAM banks among cores to facilitate communication among tasks. We evaluate DuoMC using a cycle-accurate multi-core simulator. Results show that DuoMC can provide better or comparable latency guarantees than state-of-the-art real-time MCs with limited performance loss (only 8% in the worst scenario) compared to the COTS MC

University of Waterloo's Institutional Repository

Worst Case Analysis of DRAM Latency in Hard Real Time Systems

Author: Wu Zheng Pei
Publication venue: 'University of Waterloo'
Publication date: 17/12/2013
Field of study

As multi-core systems are becoming more popular in real time embedded systems, strict timing requirements for accessing shared resources must be met. In particular, a detailed latency analysis for Double Data Rate Dynamic RAM (DDR DRAM) is highly desirable. Several researchers have proposed predictable memory controllers to provide guaranteed memory access latency. However, the performance of such controllers sharply decreases as DDR devices become faster and the width of memory buses is increased. Therefore, a novel and composable approach is proposed that provides improved latency bounds compared to existing works by explicitly modeling the DRAM state. In particular, this new approach scales better with increasing number of cores and memory speed. Benchmark evaluation results show up to a 45% improvement in the worst case task execution time compared to a competing predictable memory controller for a system with 16 cores

University of Waterloo's Institutional Repository

Predictable Shared Memory Resources for Multi-Core Real-Time Systems

Author: Hassan Mohamed
Publication venue: 'University of Waterloo'
Publication date: 23/03/2017
Field of study

A major challenge in multi-core real-time systems is the interference problem on the shared hardware components amongst cores. Examples of these shared components include buses, on-chip caches, and off-chip dynamic random access memories (DRAMs). The problem arises because different cores in the system interfere with each other, while competing to access the shared hardware components. It is a challenging problem for real-time systems because operations of one core affect the temporal behaviour of other cores, which complicates the timing analysis of the system. We address this problem by making the following contributions. 1) For shared buses, we propose CArb, a predictable and criticality-aware arbiter, which provides guaranteed and differential service to tasks based on their requirements. In addition, we utilize CArb to mitigate overheads resulting from system switching among different modes. 2) For the cache hierarchy, we address the problem of maintaining cache coherence in multi-core real-time systems by modifying current coherence protocols such that data sharing is viable for real-time systems in a manner amenable for timing analysis. The proposed solution provides performance improvements, does not impose any scheduling restrictions, and does not require any source-code modifications. 3) At the shared DRAM level, we propose PMC, a programmable memory controller that provides latency guarantees for running tasks upon accessing the off-chip DRAM, while assigning differential memory services to tasks based on their bandwidth and latency requirements. In addition to PMC, we conduct a latency-based analysis on DRAM memory controllers (MCs). Our analysis provides both best-case and worst-case bounds on the latency that any request suffers upon accessing the DRAM. The analysis comprehensively covers all possible interactions of successive requests considering all possible DRAM states. Finally, we formally model request interrelations and DRAM command interactions. We use these models to develop an automated validation framework along with benchmark suites to validate and evaluate PMC and any other MC, which we release as an open-source tool

University of Waterloo's Institutional Repository

A Comprehensive Study of DRAM Controllers in Real-Time Systems

Author: Guo Danlu
Publication venue: 'University of Waterloo'
Publication date: 22/12/2016
Field of study

The DRAM main memory is a critical component and a performance bottleneck of almost all computing systems. Since the DRAM is a shared memory resource on multi-core plat- forms, all cores contend for the memory bandwidth. Therefore, there is a keen interest in the real-time community to design predictable DRAM controllers to provide a low memory access latency bound to meet the strict timing requirement of real-time applications. Due to the lack of generalization of publicly available DRAM controller models in full-system and DRAM device simulators, researchers often design in-house simulator to validate their designs. An extensible cycle-accurate DRAM controller simulation frame- work is developed to simplify the process of validating new DRAM controller designs. To prove the extensibility and reusability of the framework, ten state-of-the-art predictable DRAM controllers are implemented in the framework with less than 200 lines of new code. With the help of the framework, a comprehensive evaluation of state-of-the-art pre- dictable DRAM controllers is performed analytically and experimentally to show the im- pact of different system parameters. This extensive evaluation allows researchers to assess the contribution of state-of-the-art DRAM controller approaches. At last, a novel DRAM controller with request reordering technique is proposed to provide a configurable trade-off between latency bound and bandwidth in mixed-critical systems. Compared to the state-of-the-art DRAM controller, there is a balance point between the two designs which depends on the locality of the task under analysis and the DRAM device used in the system

University of Waterloo's Institutional Repository

Predictable and composable system-on-chip memory controllers

Author: Akesson K.B.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2010
Field of study

Contemporary System-on-Chip (SoC) become more and more complex, as increasing integration results in a larger number of concurrently executing applications. These applications consist of tasks that are mapped on heterogeneous multi-processor platforms with distributed memory hierarchies, where SRAMs and SDRAMs are shared by a variety of arbiters. Some applications have real-time requirements, meaning that they must perform a particular computation before a deadline to guarantee functional correctness, or to prevent quality degradation. Mapping the applications on the platform such that all real-time requirements are satisfied is very challenging. The number of possible mappings of tasks to processing elements and data structures to memories may be large, and appropriate configuration settings must be determined once the mapping is chosen. Verifying that a particular mapping satisfies all application requirements is typically done by system-level simulation. However, resource sharing causes interference between applications, making their temporal behaviors inter-dependent. All concurrently executing applications must hence be verified together, causing the verification complexity of the system to increase exponentially with the number of applications. Together these factors contribute to making the integration and verification process a dominant part of SoC development, both in terms of time and money. Predictable and composable systems are proposed to manage the increasing verification complexity. Predictable systems provide lower bounds on application performance, while applications in composable systems are completely isolated and cannot affect each other’s temporal behavior by even a single clock cycle. Predictable systems enable formal verification that covers all possible interactions with the platform. However, this assumes that the behavior of an application is captured in a performance model, which is not the case for many applications. Composability offers a complementary verification approach by letting these applications be verified independently by simulation with linear verification complexity. A limitation of current predictable and composable systems is that there are no memory controllers supporting the concepts in a general way. Current SRAM controllers can be shared in a predictable way with a variety of arbiters, but are only composable if statically scheduled or shared using time-division multiplexing. Existing SDRAM controllers are not composable, and are either unpredictable or limited to applications that are statically scheduled. This thesis addresses the limitations of current predictable and composable systems by proposing a general predictable and composable memory controller, thereby addressing the mapping and verification problem in embedded systems. The proposed memory controller is divided into a front-end and a back-end. The back-end is specific for DDR2/DDR3 SDRAM and makes the memory behave in a predictable manner using precomputed memory patterns that are dynamically combined at run time. The front-end contains buffering and an arbiter in the class of Latency-Rate (LR) servers, which is a class with many well-known predictable arbiters. We extend this class with a Credit-Controlled Static-Priority (CCSP) arbiter that is developed specifically for shared resources with latency-critical requestors and high loads, such as memories. Three key features of CCSP are: 1) It accommodates latency-critical requestors with low bandwidth requirements without wasting bandwidth. 2) Over-allocated bandwidth can be made negligible at an increased area cost, without affecting latency. 3) It has a small implementation that runs fast enough to keep up with most DDR2/DDR3 memories. The proposed front-end is general and can be used with other predictable resources, such as SRAM controllers. The proposed memory controller hence supports multiple arbiter and memory types, thus addressing the diversity in modern SoCs. The combination of front-end and predictable memory behaves like a LR server, which is the shared resource abstraction used in this work. In essence, a LR server guarantees a requestor a minimum bandwidth and a maximum latency, enabling formal verification of real-time requirements. The LR server model is compatible with several commonly used formal analysis frameworks, such as network calculus and data-flow analysis. Our memory controller hence allows any combination of predictable memory and LR arbiter to be used transparently for formal verification of applications with any of these frameworks. Sharing a predictable memory at run-time results in interference between requestors, making the memory controller non-composable. This is addressed by adding a Delay Block to the front-end that delays all signals sent from the front-end to a requestor to always emulate worst-case interference. This makes requestors unable to affect each other’s temporal behavior, which is sufficient to guarantee composability on the level of applications. Our predictable memory controller hence offers composable service with a variety of memory and arbiter types, which widely extends the scope of composable platforms. Another benefit of this approach is that it enables composable service to be dynamically enabled and disabled, enabling requestors that do not require composable service to use slack bandwidth to improve performance. The predictable and composable memory controller is supported by a configuration flow that automatically computes memory patterns and arbiter settings to satisfy given bandwidth and latency requirements. The flow uses abstraction to separate the configuration of the memory and the arbiter, enabling settings to be computed in a streamlined fashion for all supported memories and arbiters

Repository TU/e

Pure OAI Repository