41 research outputs found

    Sharing GPUs for Real-Time Autonomous-Driving Systems

    Get PDF
    Autonomous vehicles at mass-market scales are on the horizon. Cameras are the least expensive among common sensor types and can preserve features such as color and texture that other sensors cannot. Therefore, realizing full autonomy in vehicles at a reasonable cost is expected to entail computer-vision techniques. These computer-vision applications require massive parallelism provided by the underlying shared accelerators, such as graphics processing units, or GPUs, to function “in real time.” However, when computer-vision researchers and GPU vendors refer to “real time,” they usually mean “real fast”; in contrast, certifiable automotive systems must be “real time” in the sense of being predictable. This dissertation addresses the challenging problem of how GPUs can be shared predictably and efficiently for real-time autonomous-driving systems. We tackle this challenge in four steps. First, we investigate NVIDIA GPUs with respect to scheduling, synchronization, and execution. We conduct an extensive set of experiments to infer NVIDIA GPU scheduling rules, which are unfortunately undisclosed by NVIDIA and are beyond access owing to their closed-source software stack. We also expose a list of pitfalls pertaining to CPU-GPU synchronization that can result in unbounded response times of GPU-using applications. Lastly, we examine a fundamental trade-off for designing real-time tasks under different execution options. Overall, our investigation provides an essential understanding of NVIDIA GPUs, allowing us to further model and analyze GPU tasks. Second, we develop a new model and conduct schedulability analysis for GPU tasks. We extend the well-studied sporadic task model with additional parameters that characterize the parallel execution of GPU tasks. We show that NVIDIA scheduling rules are subject to fundamental capacity loss, which implies a necessary total utilization bound. We derive response-time bounds for GPU task systems that satisfy our schedulability conditions. Third, we address an industrial challenge of supplying the throughput performance of computer-vision frameworks to support adequate coverage and redundancy offered by an array of cameras. We re-think the design of convolution neural network (CNN) software to better utilize hardware resources and achieve increased throughput (number of simultaneous camera streams) without any appreciable increase in per-frame latency (camera to CNN output) or reduction of per-stream accuracy. Fourth, we apply our analysis to a finer-grained graph scheduling of a computer-vision standard, OpenVX, which explicitly targets embedded and real-time systems. We evaluate both the analytical and empirical real-time performance of our approach.Doctor of Philosoph

    On the design and implementation of a cache-aware soft real-time scheduler for multicore platforms

    Get PDF
    Real-time systems are those for which timing constraints must be satisfied. In this dissertation, research on multiprocessor real-time systems is extended to support multicore platforms, which contain multiple processing cores on a single chip. Specifically, this dissertation focuses on designing a cache-aware real-time scheduler to reduce shared cache miss rates, and increase the level of shared cache reuse, on multicore platforms when timing constraints must be satisfied. This scheduler, implemented in Linux, employs: (1) a scheduling method for real-time workloads that satisfies timing constraints while making scheduling choices that reduce shared cache miss rates; and (2) a profiler that quantitatively approximates the cache impact of every task during its execution. In experiments, it is shown that the proposed cache-aware scheduler can result in significantly reduced shared cache miss rates over other approaches. This is especially true when sufficient hardware support is provided, primarily in the form of cache-related performance monitoring features. It is also shown that scheduler-related overheads are comparable to other scheduling approaches, and therefore overheads would not be expected to offset any reduction in cache miss rate. Finally, in experiments involving a multimedia server workload, it was found that the use of the proposed cache-aware scheduler allowed the size of the workload to be increased. Prior work in the area of cache-aware scheduling for multicore platforms has not addressed support for real-time workloads, and prior work in the area of real-time scheduling has not addressed shared caches on multicore platforms. For real-time workloads running on multicore platforms, a decrease in shared cache miss rates can result in a corresponding decrease in execution times, which may allow a larger real-time workload to be supported, or hardware requirements (or costs) to be reduced. As multicore platforms are becoming ubiquitous in many domains, including those in which real-time constraints must be satisfied, cache-aware scheduling approaches such as that presented in this dissertation are of growing importance. If the chip manufacturing industry continues to adhere to the multicore paradigm (which is likely, given current projections), then such approaches should remain relevant as processors evolve

    Real-Time Scheduling for GPUs with Applications in Advanced Automotive Systems

    Get PDF
    Self-driving cars, once constrained to closed test tracks, are beginning to drive alongside human drivers on public roads. Loss of life or property may result if the computing systems of automated vehicles fail to respond to events at the right moment. We call such systems that must satisfy precise timing constraints “real-time systems.” Since the 1960s, researchers have developed algorithms and analytical techniques used in the development of real-time systems; however, this body of knowledge primarily applies to traditional CPU-based platforms. Unfortunately, traditional platforms cannot meet the computational requirements of self-driving cars without exceeding the power and cost constraints of commercially viable vehicles. We argue that modern graphics processing units, or GPUs, represent a feasible alternative, but new algorithms and analytical techniques must be developed in order to integrate these uniquely constrained processors into a real-time system. The goal of the research presented in this dissertation is to discover and remedy the issues that prevent the use of GPUs in real-time systems. To overcome these issues, we design and implement a real-time multi-GPU scheduler, called GPUSync. GPUSync tightly controls access to a GPU’s computational and DMA processors, enabling simultaneous use despite potential limitations in GPU hardware. GPUSync enables tasks to migrate among GPUs, allowing new classes of real-time multi-GPU computing platforms. GPUSync employs heuristics to guide scheduling decisions to improve system efficiency without risking violations in real-time constraints. GPUSync may be paired with a wide variety of common real-time CPU schedulers. GPUSync supports closed-source GPU runtimes and drivers without loss in functionality. We evaluate GPUSync with both analytical and runtime experiments. In our analytical experiments, we model and evaluate over fifty configurations of GPUSync. We determine which configurations support the greatest computational capacity while maintaining real-time constraints. In our runtime experiments, we execute computer vision programs similar to those found in automated vehicles, with and without GPUSync. Our results demonstrate that GPUSync greatly reduces jitter in video processing. Research into real-time systems with GPUs is a new area of study. Although there is prior work on such systems, no other GPU scheduling framework is as comprehensive and flexible as GPUSync.Doctor of Philosoph

    Towards a centralized multicore automotive system

    Get PDF
    Today’s automotive systems are inundated with embedded electronics to host chassis, powertrain, infotainment, advanced driver assistance systems, and other modern vehicle functions. As many as 100 embedded microcontrollers execute hundreds of millions of lines of code in a single vehicle. To control the increasing complexity in vehicle electronics and services, automakers are planning to consolidate different on-board automotive functions as software tasks on centralized multicore hardware platforms. However, these vehicle software services have different and contrasting timing, safety, and security requirements. Existing vehicle operating systems are ill-equipped to provide all the required service guarantees on a single machine. A centralized automotive system aims to tackle this by assigning software tasks to multiple criticality domains or levels according to their consequences of failures, or international safety standards like ISO 26262. This research investigates several emerging challenges in time-critical systems for a centralized multicore automotive platform and proposes a novel vehicle operating system framework to address them. This thesis first introduces an integrated vehicle management system (VMS), called DriveOS™, for a PC-class multicore hardware platform. Its separation kernel design enables temporal and spatial isolation among critical and non-critical vehicle services in different domains on the same machine. Time- and safety-critical vehicle functions are implemented in a sandboxed Real-time Operating System (OS) domain, and non-critical software is developed in a sandboxed general-purpose OS (e.g., Linux, Android) domain. To leverage the advantages of model-driven vehicle function development, DriveOS provides a multi-domain application framework in Simulink. This thesis also presents a real-time task pipeline scheduling algorithm in multiprocessors for communication between connected vehicle services with end-to-end guarantees. The benefits and performance of the overall automotive system framework are demonstrated with hardware-in-the-loop testing using real-world applications, car datasets and simulated benchmarks, and with an early-stage deployment in a production-grade luxury electric vehicle

    SCHEDULING REAL-TIME GRAPH-BASED WORKLOADS

    Get PDF
    Developments in the semiconductor industry in the previous decades have made possible computing platforms with very large computing capacities that, in turn, have stimulated the rapid progress of computationally intensive computer vision (CV) algorithms with highly parallelizable structure (often represented as graphs). Applications using such algorithms are the foundation for the transformation of semi-autonomous systems (e.g., advanced driver-assist systems) to future fully-autonomous systems (e.g., self-driving cars). Enabling mass-produced safety-critical systems with full autonomy requires real-time execution guarantees as a part of system certification.Since multiple CV applications may need to share the same hardware platform due to size, weight, power, and cost constraints, system component isolation is necessary to avoid explosive interference growth that breaks all execution guarantees. Existing software certification processes achieve component isolation through time partitioning, which can be broken by accelerator usage, which is essential for high-efficacy CV algorithms.The goal of this dissertation is to make a first step towards providing real-time guarantees for safety-critical systems by analyzing the scheduling of highly parallel accelerator-using workloads isolated in system components. The specific contributions are threefold.First, a general method for graph-based workloads’ response-time-bound reduction through graph structure modifications is introduced, leading to significant response-time-bound reductions. Second, a generalized real-time task model is introduced that enables real-time response-time bounds for a wider range of graph-based workloads. A proposed response-time analysis for the introduced model accounts for potential accelerator usage within tasks. Third, a scheduling approach for graph-based workloads in a single system component is proposed that ensures the temporal isolation of system components. A response-time analysis for workloads with accelerator usage is presented alongside a non-mandatory schedulability-improvement step. This approach can help to enable component-wise certification in the considered systems.Doctor of Philosoph

    Complex scheduling models and analyses for property-based real-time embedded systems

    Get PDF
    Modern multi core architectures and parallel applications pose a significant challenge to the worst-case centric real-time system verification and design efforts. The involved model and parameter uncertainty contest the fidelity of formal real-time analyses, which are mostly based on exact model assumptions. In this dissertation, various approaches that can accept parameter and model uncertainty are presented. In an attempt to improve predictability in worst-case centric analyses, the exploration of timing predictable protocols are examined for parallel task scheduling on multiprocessors and network-on-chip arbitration. A novel scheduling algorithm, called stationary rigid gang scheduling, for gang tasks on multiprocessors is proposed. In regard to fixed-priority wormhole-switched network-on-chips, a more restrictive family of transmission protocols called simultaneous progression switching protocols is proposed with predictability enhancing properties. Moreover, hierarchical scheduling for parallel DAG tasks under parameter uncertainty is studied to achieve temporal- and spatial isolation. Fault-tolerance as a supplementary reliability aspect of real-time systems is examined, in spite of dynamic external causes of fault. Using various job variants, which trade off increased execution time demand with increased error protection, a state-based policy selection strategy is proposed, which provably assures an acceptable quality-of-service (QoS). Lastly, the temporal misalignment of sensor data in sensor fusion applications in cyber-physical systems is examined. A modular analysis based on minimal properties to obtain an upper-bound for the maximal sensor data time-stamp difference is proposed

    Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing

    Get PDF
    The availability of many-core computing platforms enables a wide variety of technical solutions for systems across the embedded, high-performance and cloud computing domains. However, large scale manycore systems are notoriously hard to optimise. Choices regarding resource allocation alone can account for wide variability in timeliness and energy dissipation (up to several orders of magnitude). Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing covers dynamic resource allocation heuristics for manycore systems, aiming to provide appropriate guarantees on performance and energy efficiency. It addresses different types of systems, aiming to harmonise the approaches to dynamic allocation across the complete spectrum between systems with little flexibility and strict real-time guarantees all the way to highly dynamic systems with soft performance requirements. Technical topics presented in the book include: Load and Resource Models Admission Control Feedback-based Allocation and Optimisation Search-based Allocation Heuristics Distributed Allocation based on Swarm Intelligence Value-Based Allocation Each of the topics is illustrated with examples based on realistic computational platforms such as Network-on-Chip manycore processors, grids and private cloud environments.Note.-- EUR 6,000 BPC fee funded by the EC FP7 Post-Grant Open Access Pilo

    Real-Time Stream Processing in Embedded Systems

    Get PDF
    Modern real-time embedded systems often involve computational-intensive data processing algorithms to meet their application requirements. As a result, there has been an increase in the use of multiprocessor platforms. The stream processing programming model aims to facilitate the construction of concurrent data processing programs to exploit the parallelism available on these architectures. However, most current stream processing frameworks or languages are not designed for use in real-time systems, let alone systems that might also have hard real-time control algorithms. This thesis contends that a generic architecture of a real-time stream processing infrastructure can be created to support predictable processing of both batched and live streaming data sources, and integrated with hard real-time control algorithms. The thesis first reviews relevant stream processing techniques, and identifies the open issues. Then a real-time stream processing task model, and an architecture for supporting that model is proposed. An approach to the integration of stream processing tasks into a real-time environment that also has hard real-time components is presented. Data is processed in parallel using execution-time servers allocated to each core. An algorithm is presented for selecting the parameters of the servers that maximises their capacities (within an overall deadline) and ensures that hard real-time components remain schedulable. Response-time analysis is derived to guarantee that the real-time requirements (deadlines for batched data processing, and latency for each data item for live data) for the stream processing activity are met. A framework, called SPRY, is implemented to support the proposed real-time stream processing architecture. The framework supports fully-partitioned applications that are scheduled using fixed priority-based scheduling techniques. A case study based on a modified Generic Avionics Platform is given to demonstrate the overall approach. Finally, the evaluation shows that the presented approach provides a better schedulability than alternative approaches
    corecore