50 research outputs found

    Erreichen von Performance in Netzwerken-On-Chip für Echtzeitsysteme

    Get PDF
    In many new applications, such as in automatic driving, high performance requirements have reached safety critical real-time systems. Consequently, Networks-on-Chip (NoCs) must efficiently host new sets of highly dynamic workloads e.g., high resolution sensor fusion and data processing, autonomous decision’s making combined with machine learning. The static platform management, as used in current safety critical systems, is no more sufficient to provide the needed level of service. A dynamic platform management could meet the challenge, but it usually suffers from a lack of predictability and the simplicity necessary for certification of safety and real-time properties. In this work, we propose a novel, global and dynamic arbitration for NoCs with real-time QoS requirements. The mechanism decouples the admission control from arbitration in routers thereby simplifying a dynamic adaptation and real-time analysis. Consequently, the proposed solution allows the deployment of a sophisticated contract-based QoS provisioning without introducing complicated and hard to maintain schemes, known from the frequently applied static arbiters. The presented work introduces an overlay network to synchronize transmissions using arbitration units called Resource Managers (RMs), which allows global and work-conserving scheduling. The description of resource allocation strategies is supplemented by protocol design and verification methodology bringing adaptive control to NoC communication in setups with different QoS requirements and traffic classes. For doing that, a formal worst-case timing analysis for the mechanism has been proposed which demonstrates that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case latencies than other strategies for realistic levels of system's utilization. The approach is not limited to a specific network architecture or topology as the mechanism does not require modifications of routers and therefore can be used together with the majority of existing manycore systems. Indeed, the evaluation followed using the generic performance optimized router designs, as well as two systems-on-chip focused on real-time deployments. The results confirmed that the proposed approach proves to exhibit significantly higher average performance in simulation and execution.In vielen neuen sicherheitskritische Anwendungen, wie z.B. dem automatisierten Fahren, werden große Anforderungen an die Leistung von Echtzeitsysteme gestellt. Daher müssen Networks-on-Chip (NoCs) neue, hochdynamische Workloads wie z.B. hochauflösende Sensorfusion und Datenverarbeitung oder autonome Entscheidungsfindung kombiniert mit maschineller Lernen, effizient auf einem System unterbringen. Die Steuerung der zugrunde liegenden NoC-Architektur, muss die Systemsicherheit vor Fehlern, resultierend aus dem dynamischen Verhalten des Systems schützen und gleichzeitig die geforderte Performance bereitstellen. In dieser Arbeit schlagen wir eine neuartige, globale und dynamische Steuerung für NoCs mit Echtzeit QoS Anforderungen vor. Das Schema entkoppelt die Zutrittskontrolle von der Arbitrierung in Routern. Hierdurch wird eine dynamische Anpassung ermöglicht und die Echtzeitanalyse vereinfacht. Der Einsatz einer ausgefeilten vertragsbasierten Ressourcen-Zuweisung wird so ermöglicht, ohne komplexe und schwer wartbare Mechanismen, welche bereits aus dem statischen Plattformmanagement bekannt sind einzuführen. Diese Arbeit stellt ein übergelagertes Netzwerk vor, welches Übertragungen mit Hilfe von Arbitrierungseinheiten, den so genannten Resource Managern (RMs), synchronisiert. Dieses überlagerte Netzwerk ermöglicht eine globale und lasterhaltende Steuerung. Die Beschreibung verschiedener Ressourcenzuweisungstrategien wird ergänzt durch ein Protokolldesign und Methoden zur Verifikation der adaptiven NoC Steuerung mit unterschiedlichen QoS Anforderungen und Verkehrsklassen. Hierfür wird eine formale Worst Case Timing Analyse präsentiert, welche das vorgestellte Verfahren abbildet. Die Resultate bestätitgen, dass die präsentierte Lösung nicht nur eine höhere Performance in der Simulation bietet, sondern auch formal kleinere Worst-Case Latenzen für realistische Systemauslastungen als andere Strategien garantiert. Der vorgestellte Ansatz ist nicht auf eine bestimmte Netzwerkarchitektur oder Topologie beschränkt, da der Mechanismus keine Änderungen an den unterliegenden Routern erfordert und kann daher zusammen mit bestehenden Manycore-Systemen eingesetzt werden. Die Evaluierung erfolgte auf Basis eines leistungsoptimierten Router-Designs sowie zwei auf Echtzeit-Anwendungen fokusierten Platformen. Die Ergebnisse bestätigten, dass der vorgeschlagene Ansatz im Durchschnitt eine deutlich höhere Leistung in der Simulation und Ausführung liefert

    Composition and synchronization of real-time components upon one processor

    Get PDF
    Many industrial systems have various hardware and software functions for controlling mechanics. If these functions act independently, as they do in legacy situations, their overall performance is not optimal. There is a trend towards optimizing the overall system performance and creating a synergy between the different functions in a system, which is achieved by replacing more and more dedicated, single-function hardware by software components running on programmable platforms. This increases the re-usability of the functions, but their synergy requires also that (parts of) the multiple software functions share the same embedded platform. In this work, we look at the composition of inter-dependent software functions on a shared platform from a timing perspective. We consider platforms comprised of one preemptive processor resource and, optionally, multiple non-preemptive resources. Each function is implemented by a set of tasks; the group of tasks of a function that executes on the same processor, along with its scheduler, is called a component. The tasks of a component typically have hard timing constraints. Fulfilling these timing constraints of a component requires analysis. Looking at a single function, co-operative scheduling of the tasks within a component has already proven to be a powerful tool to make the implementation of a function more predictable. For example, co-operative scheduling can accelerate the execution of a task (making it easier to satisfy timing constraints), it can reduce the cost of arbitrary preemptions (leading to more realistic execution-time estimates) and it can guarantee access to other resources without the need for arbitration by other protocols. Since timeliness is an important functional requirement, (re-)use of a component for composition and integration on a platform must deal with timing. To enable us to analyze and specify the timing requirements of a particular component in isolation from other components, we reserve and enforce the availability of all its specified resources during run-time. The real-time systems community has proposed hierarchical scheduling frameworks (HSFs) to implement this isolation between components. After admitting a component on a shared platform, a component in an HSF keeps meeting its timing constraints as long as it behaves as specified. If it violates its specification, it may be penalized, but other components are temporally isolated from the malignant effects. A component in an HSF is said to execute on a virtual platform with a dedicated processor at a speed proportional to its reserved processor supply. Three effects disturb this point of view. Firstly, processor time is supplied discontinuously. Secondly, the actual processor is faster. Thirdly, the HSF no longer guarantees the isolation of an individual component when two arbitrary components violate their specification during access to non-preemptive resources, even when access is arbitrated via well-defined real-time protocols. The scientific contributions of this work focus on these three issues. Our solutions to these issues cover the system design from component requirements to run-time allocation. Firstly, we present a novel scheduling method that enables us to integrate the component into an HSF. It guarantees that each integrated component executes its tasks exactly in the same order regardless of a continuous or a discontinuous supply of processor time. Using our method, the component executes on a virtual platform and it only experiences that the processor speed is different from the actual processor speed. As a result, we can focus on the traditional scheduling problem of meeting deadline constraints of tasks on a uni-processor platform. For such platforms, we show how scheduling tasks co-operatively within a component helps to meet the deadlines of this component. We compare the strength of these cooperative scheduling techniques to theoretically optimal schedulers. Secondly, we standardize the way of computing the resource requirements of a component, even in the presence of non-preemptive resources. We can therefore apply the same timing analysis to the components in an HSF as to the tasks inside, regardless of their scheduling or their protocol being used for non-preemptive resources. This increases the re-usability of the timing analysis of components. We also make non-preemptive resources transparent during the development cycle of a component, i.e., the developer of a component can be unaware of the actual protocol being used in an HSF. Components can therefore be unaware that access to non-preemptive resources requires arbitration. Finally, we complement the existing real-time protocols for arbitrating access to non-preemptive resources with mechanisms to confine temporal faults to those components in the HSF that share the same non-preemptive resources. We compare the overheads of sharing non-preemptive resources between components with and without mechanisms for confinement of temporal faults. We do this by means of experiments within an HSF-enabled real-time operating system

    QoS-aware predictive workflow scheduling

    Full text link
    This research places the basis of QoS-aware predictive workflow scheduling. This research novel contributions will open up prospects for future research in handling complex big workflow applications with high uncertainty and dynamism. The results from the proposed workflow scheduling algorithm shows significant improvement in terms of the performance and reliability of the workflow applications

    Improving the end-to-end latency of datacenter applications using coordination across application components

    Get PDF
    To handle millions of user requests every second and process hundreds of terabytes of data each day, many organizations have turned to large datacenter-scale computing systems. The applications running in these datacenters consist of a multitude of dependent logical components or stages which perform specific functionality. These stages are connected to form a directed acyclic graph (DAG), with edges representing input-output dependencies. Each stage can run over tens to thousands of machines, and involves multiple cluster sub-systems such as storage, network and compute. The scale and complexity of these applications can lead to significant delays in their end-to-end latency. However, the organizations running these applications have strict requirements on this latency as it directly affects their revenue and operational costs. Addressing this problem, the goal of this dissertation is to develop scheduling and resource allocation techniques to optimize for the end-to-end latency of datacenter applications. The key idea behind these techniques is to utilize coordination between different application components, allowing us to efficiently allocate cluster resources. In particular, we develop planning algorithms that coordinate the storage and compute sub-systems in datacenters to determine how many resources should be allocated to each stage in an application along with where in the cluster should they be allocated, to meet application requirements (e.g., completion time goals, minimize average completion time etc.). To further speed up applications at runtime, we develop a few latency reduction techniques: reissuing laggards elsewhere in the cluster, returning partial results and speeding up laggards by giving them extra resources. We perform a global optimization to coordinate across all the stages in an application DAG and determine which of these techniques works best for each stage, while ensuring that the cost incurred by these techniques is within a given end-to-end budget. We use application characteristics to predict and determine how resources should be allocated to different application components to meet the end-to-end latency requirements. We evaluate our techniques on two different kinds of datacenter applications: (a) web services, and (b) data analytics. With large-scale simulations and an implementation in Apache Yarn (Hadoop 2.0), we use workloads derived from production traces to show that our techniques can achieve more than 50% reduction in the 99th percentile latency of web services and up to 56% reduction in the median latency of data analytics jobs
    corecore