333 research outputs found

    Deploying Hard Real-Time Control Software on Chip-Multiprocessors

    Get PDF
    Abstract—Deploying real-time control systems software on multiprocessors requires distributing tasks on multiple processing nodes and coordinating their executions using a protocol. One such protocol is the discrete-event (DE) model of computation. In this paper, we investigate distributed discrete-event (DE) with null-message protocol (NMP) on a multicore system for real-time control software. We illustrate analytically and experimentally that even with the null-message deadlock avoidance scheme in the protocol, the system can deadlock due to inter-core message dependencies. We identify two central reasons for such deadlocks: 1) the lack of an upper-bound on packet transmission rates and processing capability, and 2) an unknown upper-bound on the communication network delay. To address these, we propose using architectural features such as timing control and real-time network-on-chips to prevent such message-dependent deadlocks. We employ these architectural techniques in conjunction with a distributed DE strategy called PTIDES for an illustrative car wash station example and later follow it with a more realistic tunnelling ball device application

    A generalized software framework for accurate and efficient management of performance goals

    Get PDF
    A number of techniques have been proposed to provide runtime performance guarantees while minimizing power consumption. One drawback of existing approaches is that they work only on a fixed set of components (or actuators) that must be specified at design time. If new components become available, these management systems must be redesigned and reimplemented. In this paper, we propose PTRADE, a novel performance management framework that is general with respect to the components it manages. PTRADE can be deployed to work on a new system with different components without redesign and reimplementation. PTRADE's generality is demonstrated through the management of performance goals for a variety of benchmarks on two different Linux/x86 systems and a simulated 128-core system, each with different components governing power and performance tradeoffs. Our experimental results show that PTRADE provides generality while meeting performance goals with low error and close to optimal power consumption.United States. Defense Advanced Research Projects Agency. The Ubiquitous High Performance Computing Progra

    Generating and Exploiting Deep Learning Variants to Increase Heterogeneous Resource Utilization in the NVIDIA Xavier

    Get PDF
    Deep learning-based solutions and, in particular, deep neural networks (DNNs) are at the heart of several functionalities in critical-real time embedded systems (CRTES) from vision-based perception (object detection and tracking) systems to trajectory planning. As a result, several DNN instances simultaneously run at any time on the same computing platform. However, while modern GPUs offer a variety of computing elements (e.g. CPUs, GPUs, and specific accelerators) in which those DNN tasks can be executed depending on their computational requirements and temporal constraints, current DNNs are mainly programmed to exploit one of them, namely, regular cores in the GPU. This creates resource imbalance and under-utilization of GPU resources when executing several DNN instances, causing an increase in DNN tasks\u27 execution time requirements. In this paper, (a) we develop different variants (implementations) of well-known DNN libraries used in the Apollo Autonomous Driving (AD) software for each of the computing elements of the latest NVIDIA Xavier SoC. Each variant can be configured to balance resource requirements and performance: the regular CPU core implementation that can run on 2, 4, and 6 cores; the GPU regular and Tensor core variants that can run in 4 or 8 GPU\u27s Streaming Multiprocessors (SM); and 1 or 2 NVIDIA\u27s Deep Learning Accelerators (NVDLA); (b) we show that each particular variant/configuration offers a different resource utilization/performance point; finally, (c) we show how those heterogeneous computing elements can be exploited by a static scheduler to sustain the execution of multiple and diverse DNN variants on the same platform
    • …
    corecore