207 research outputs found

    Mixed-Criticality Scheduling with Dynamic Redistribution of Shared Cache

    Get PDF
    The design of mixed-criticality systems often involves painful tradeoffs between safety guarantees and performance. However, the use of more detailed architectural models in the design and analysis of scheduling arrangements for mixed-criticality systems can provide greater confidence in the analysis, but also opportunities for better performance. Motivated by this view, we propose an extension of Vestal\u27s model for mixed-criticality multicore systems that (i) accounts for the per-task partitioning of the last-level cache and (ii) supports the dynamic reassignment, for better schedulability, of cache portions initially reserved for lower-criticality tasks to the higher-criticality tasks, when the system switches to high-criticality mode. To this model, we apply partitioned EDF scheduling with Ekberg and Yi\u27s deadline-scaling technique. Our schedulability analysis and scalefactor calculation is cognisant of the cache resources assigned to each task, by using WCET estimates that take into account these resources. It is hence able to leverage the dynamic reconfiguration of the cache partitioning, at mode change, for better performance, in terms of provable schedulability. We also propose heuristics for partitioning the cache in low- and high-criticality mode, that promote schedulability. Our experiments with synthetic task sets, indicate tangible improvements in schedulability compared to a baseline cache-aware arrangement where there is no redistribution of cache resources from low- to high-criticality tasks in the event of a mode change

    Erreichen von Performance in Netzwerken-On-Chip für Echtzeitsysteme

    Get PDF
    In many new applications, such as in automatic driving, high performance requirements have reached safety critical real-time systems. Consequently, Networks-on-Chip (NoCs) must efficiently host new sets of highly dynamic workloads e.g., high resolution sensor fusion and data processing, autonomous decision’s making combined with machine learning. The static platform management, as used in current safety critical systems, is no more sufficient to provide the needed level of service. A dynamic platform management could meet the challenge, but it usually suffers from a lack of predictability and the simplicity necessary for certification of safety and real-time properties. In this work, we propose a novel, global and dynamic arbitration for NoCs with real-time QoS requirements. The mechanism decouples the admission control from arbitration in routers thereby simplifying a dynamic adaptation and real-time analysis. Consequently, the proposed solution allows the deployment of a sophisticated contract-based QoS provisioning without introducing complicated and hard to maintain schemes, known from the frequently applied static arbiters. The presented work introduces an overlay network to synchronize transmissions using arbitration units called Resource Managers (RMs), which allows global and work-conserving scheduling. The description of resource allocation strategies is supplemented by protocol design and verification methodology bringing adaptive control to NoC communication in setups with different QoS requirements and traffic classes. For doing that, a formal worst-case timing analysis for the mechanism has been proposed which demonstrates that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case latencies than other strategies for realistic levels of system's utilization. The approach is not limited to a specific network architecture or topology as the mechanism does not require modifications of routers and therefore can be used together with the majority of existing manycore systems. Indeed, the evaluation followed using the generic performance optimized router designs, as well as two systems-on-chip focused on real-time deployments. The results confirmed that the proposed approach proves to exhibit significantly higher average performance in simulation and execution.In vielen neuen sicherheitskritische Anwendungen, wie z.B. dem automatisierten Fahren, werden große Anforderungen an die Leistung von Echtzeitsysteme gestellt. Daher müssen Networks-on-Chip (NoCs) neue, hochdynamische Workloads wie z.B. hochauflösende Sensorfusion und Datenverarbeitung oder autonome Entscheidungsfindung kombiniert mit maschineller Lernen, effizient auf einem System unterbringen. Die Steuerung der zugrunde liegenden NoC-Architektur, muss die Systemsicherheit vor Fehlern, resultierend aus dem dynamischen Verhalten des Systems schützen und gleichzeitig die geforderte Performance bereitstellen. In dieser Arbeit schlagen wir eine neuartige, globale und dynamische Steuerung für NoCs mit Echtzeit QoS Anforderungen vor. Das Schema entkoppelt die Zutrittskontrolle von der Arbitrierung in Routern. Hierdurch wird eine dynamische Anpassung ermöglicht und die Echtzeitanalyse vereinfacht. Der Einsatz einer ausgefeilten vertragsbasierten Ressourcen-Zuweisung wird so ermöglicht, ohne komplexe und schwer wartbare Mechanismen, welche bereits aus dem statischen Plattformmanagement bekannt sind einzuführen. Diese Arbeit stellt ein übergelagertes Netzwerk vor, welches Übertragungen mit Hilfe von Arbitrierungseinheiten, den so genannten Resource Managern (RMs), synchronisiert. Dieses überlagerte Netzwerk ermöglicht eine globale und lasterhaltende Steuerung. Die Beschreibung verschiedener Ressourcenzuweisungstrategien wird ergänzt durch ein Protokolldesign und Methoden zur Verifikation der adaptiven NoC Steuerung mit unterschiedlichen QoS Anforderungen und Verkehrsklassen. Hierfür wird eine formale Worst Case Timing Analyse präsentiert, welche das vorgestellte Verfahren abbildet. Die Resultate bestätitgen, dass die präsentierte Lösung nicht nur eine höhere Performance in der Simulation bietet, sondern auch formal kleinere Worst-Case Latenzen für realistische Systemauslastungen als andere Strategien garantiert. Der vorgestellte Ansatz ist nicht auf eine bestimmte Netzwerkarchitektur oder Topologie beschränkt, da der Mechanismus keine Änderungen an den unterliegenden Routern erfordert und kann daher zusammen mit bestehenden Manycore-Systemen eingesetzt werden. Die Evaluierung erfolgte auf Basis eines leistungsoptimierten Router-Designs sowie zwei auf Echtzeit-Anwendungen fokusierten Platformen. Die Ergebnisse bestätigten, dass der vorgeschlagene Ansatz im Durchschnitt eine deutlich höhere Leistung in der Simulation und Ausführung liefert

    Low-Overhead Online Assessment of Timely Progress as a System Commodity

    Get PDF

    A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems

    Get PDF
    This survey provides an overview of the scientific literature on timing verification techniques for multi-core real-time systems. It reviews the key results in the field from its origins around 2006 to the latest research published up to the end of 2018. The survey highlights the key issues involved in providing guarantees of timing correctness for multi-core systems. A detailed review is provided covering four main categories: full integration, temporal isolation, integrating interference effects into schedulability analysis, and mapping and allocation. The survey concludes with a discussion of the advantages and disadvantages of these different approaches, identifying open issues, key challenges, and possible directions for future research

    Timing in Technischen Sicherheitsanforderungen für Systementwürfe mit heterogenen Kritikalitätsanforderungen

    Get PDF
    Traditionally, timing requirements as (technical) safety requirements have been avoided through clever functional designs. New vehicle automation concepts and other applications, however, make this harder or even impossible and challenge design automation for cyber-physical systems to provide a solution. This thesis takes upon this challenge by introducing cross-layer dependency analysis to relate timing dependencies in the bounded execution time (BET) model to the functional model of the artifact. In doing so, the analysis is able to reveal where timing dependencies may violate freedom from interference requirements on the functional layer and other intermediate model layers. For design automation this leaves the challenge how such dependencies are avoided or at least be bounded such that the design is feasible: The results are synthesis strategies for implementation requirements and a system-level placement strategy for run-time measures to avoid potentially catastrophic consequences of timing dependencies which are not eliminated from the design. Their applicability is shown in experiments and case studies. However, all the proposed run-time measures as well as very strict implementation requirements become ever more expensive in terms of design effort for contemporary embedded systems, due to the system's complexity. Hence, the second part of this thesis reflects on the design aspect rather than the analysis aspect of embedded systems and proposes a timing predictable design paradigm based on System-Level Logical Execution Time (SL-LET). Leveraging a timing-design model in SL-LET the proposed methods from the first part can now be applied to improve the quality of a design -- timing error handling can now be separated from the run-time methods and from the implementation requirements intended to guarantee them. The thesis therefore introduces timing diversity as a timing-predictable execution theme that handles timing errors without having to deal with them in the implemented application. An automotive 3D-perception case study demonstrates the applicability of timing diversity to ensure predictable end-to-end timing while masking certain types of timing errors.Traditionell wurden Timing-Anforderungen als (technische) Sicherheitsanforderungen durch geschickte funktionale Entwürfe vermieden. Neue Fahrzeugautomatisierungskonzepte und Anwendungen machen dies jedoch schwieriger oder gar unmöglich; Aufgrund der Problemkomplexität erfordert dies eine Entwurfsautomatisierung für cyber-physische Systeme heraus. Diese Arbeit nimmt sich dieser Herausforderung an, indem sie eine schichtenübergreifende Abhängigkeitsanalyse einführt, um zeitliche Abhängigkeiten im Modell der beschränkten Ausführungszeit (BET) mit dem funktionalen Modell des Artefakts in Beziehung zu setzen. Auf diese Weise ist die Analyse in der Lage, aufzuzeigen, wo Timing-Abhängigkeiten die Anforderungen an die Störungsfreiheit auf der funktionalen Schicht und anderen dazwischenliegenden Modellschichten verletzen können. Für die Entwurfsautomatisierung ergibt sich daraus die Herausforderung, wie solche Abhängigkeiten vermieden oder zumindest so eingegrenzt werden können, dass der Entwurf machbar ist: Das Ergebnis sind Synthesestrategien für Implementierungsanforderungen und eine Platzierungsstrategie auf Systemebene für Laufzeitmaßnahmen zur Vermeidung potentiell katastrophaler Folgen von Timing-Abhängigkeiten, die nicht aus dem Entwurf eliminiert werden. Ihre Anwendbarkeit wird in Experimenten und Fallstudien gezeigt. Allerdings werden alle vorgeschlagenen Laufzeitmaßnahmen sowie sehr strenge Implementierungsanforderungen für moderne eingebettete Systeme aufgrund der Komplexität des Systems immer teurer im Entwurfsaufwand. Daher befasst sich der zweite Teil dieser Arbeit eher mit dem Entwurfsaspekt als mit dem Analyseaspekt von eingebetteten Systemen und schlägt ein Entwurfsparadigma für vorhersagbares Timing vor, das auf der System-Level Logical Execution Time (SL-LET) basiert. Basierend auf einem Timing-Entwurfsmodell in SL-LET können die vorgeschlagenen Methoden aus dem ersten Teil nun angewandt werden, um die Qualität eines Entwurfs zu verbessern -- die Behandlung von Timing-Fehlern kann nun von den Laufzeitmethoden und von den Implementierungsanforderungen, die diese garantieren sollen, getrennt werden. In dieser Arbeit wird daher Timing Diversity als ein Thema der Timing-Vorhersage in der Ausführung eingeführt, das Timing-Fehler behandelt, ohne dass sie in der implementierten Anwendung behandelt werden müssen. Anhand einer Fallstudie aus dem Automobilbereich (3D-Umfeldwahrnehmung) wird die Anwendbarkeit von Timing-Diversität demonstriert, um ein vorhersagbares Ende-zu-Ende-Timing zu gewährleisten und gleichzeitig in der Lage zu sein, bestimmte Arten von Timing-Fehlern zu maskieren

    DuoMC: Tight DRAM Latency Bounds with Shared Banks and Near-COTS Performance

    Get PDF
    © {Reza Mirosanlou, Mohamed Hassan, and Rodolfo Pellizzoni | ACM} {2021}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in {In The International Symposium on Memory Systems }, https://doi.org/10.1145/3488423.3488431DRAM memory controllers (MCs) in COTS systems are designed primarily for average performance, offering no worst-case guarantees, while real-time MCs provide timing guarantees at the cost of a significant average performance degradation. For this reason, hardware vendors have been reluctant to integrate real-time solutions in high-performance platforms. In this paper, we overcome this performance-predictability trade-off by introducing DuoMC, a novel memory controller that promotes to augment COTS MCs with a real-time scheduler and run-time monitoring to provide predictability guarantees. Leveraging the fact that the resource is barely overloaded, DuoMC allows the system to enjoy the high performance of the conventional MC most of the time, while switching to the real-time scheduler only when timing guarantees risk being violated, which rarely occurs. In addition, unlike most existing real-time MCs, DuoMC enables the utilization of both private and shared DRAM banks among cores to facilitate communication among tasks. We evaluate DuoMC using a cycle-accurate multi-core simulator. Results show that DuoMC can provide better or comparable latency guarantees than state-of-the-art real-time MCs with limited performance loss (only 8% in the worst scenario) compared to the COTS MC

    Design And Analysis Of Memory Management Techniques For Next-Generation Gpus

    Get PDF
    Graphics Processing Unit (GPU)-based architectures have become the default accelerator choice for a large number of data-parallel applications because they are able to provide high compute throughput at a competitive power budget. Unlike CPUs which typically have limited multi-threading capability, GPUs execute large numbers of threads concurrently to achieve high thread-level parallelism (TLP). While the computation of each thread requires its corresponding data to be loaded from or stored to the memory, the key to supporting the high TLP of GPUs lies in the high bandwidth provided by the GPU memory system. However, with the continuous scaling of GPUs, the challenges of designing an efficient GPU memory system have become two-fold. On one hand, to keep the growing compute and memory resources highly utilized, co-locating two or more kernels in the GPU has become an inevitable trend. One of the major roadblocks in achieving the maximum benefits of multi-application execution is the difficulty to design mechanisms that can efficiently and fairly manage the application interference in the shared caches and the main memory. On the other hand, to maintain the continuous scaling of GPU performance, the increasing energy consumption of the memory system has become a major problem because of its limited power budget. This limitation of the GPU memory energy restricts its maximum theoretical bandwidth and in turn limits the overall throughput. To address the aforementioned challenges, this dissertation proposes three different approaches. First, this dissertation shows that high efficiency and fairness can be achieved for GPU multi-programming with novel TLP management techniques. We propose a new metric, effective bandwidth (EB), to accurately estimate the shared resources in the GPU memory hierarchy. Meanwhile, we propose pattern-based searching scheme (PBS) that can quickly and accurately achieve efficiency or fairness via managing the TLP of each application. Second, to reduce data movement and improve GPU throughput, this dissertation develops Address-Stride Assisted Approximate Value Predictor (ASAP) for GPUs. We show that by utilizing address stride and value stride correlation present in GPGPU applications, significant data movement reduction and throughput improvement can be achieved at a much lower application quality loss and hardware overhead. ASAP achieves this by predicting load values if it detects strides in their corresponding addresses. Third, this dissertation shows that GPU memory energy can be significantly reduced by utilizing novel memory scheduling techniques. We propose a lazy memory scheduler which significantly improves the row buffer locality of GPU memory by leveraging the latency and error tolerance of GPGPU applications. Finally, our new work targets data movement reduction with flexible data precisions. We present initial results to motivate novel data types and architectural support to dynamically reduce the data size transferred per each memory operation. Altogether, this dissertation develops several innovative techniques to improve the GPU memory system efficiency, which are necessary for enabling the development of next-generation GPUs
    corecore