421,618 research outputs found

    On-Device Deep Learning Inference for System-on-Chip (SoC) Architectures

    Get PDF
    As machine learning becomes ubiquitous, the need to deploy models on real-time, embedded systems will become increasingly critical. This is especially true for deep learning solutions, whose large models pose interesting challenges for target architectures at the “edge” that are resource-constrained. The realization of machine learning, and deep learning, is being driven by the availability of specialized hardware, such as system-on-chip solutions, which provide some alleviation of constraints. Equally important, however, are the operating systems that run on this hardware, and specifically the ability to leverage commercial real-time operating systems which, unlike general purpose operating systems such as Linux, can provide the low-latency, deterministic execution required for embedded, and potentially safety-critical, applications at the edge. Despite this, studies considering the integration of real-time operating systems, specialized hardware, and machine learning/deep learning algorithms remain limited. In particular, better mechanisms for real-time scheduling in the context of machine learning applications will prove to be critical as these technologies move to the edge. In order to address some of these challenges, we present a resource management framework designed to provide a dynamic on-device approach to the allocation and scheduling of limited resources in a real-time processing environment. These types of mechanisms are necessary to support the deterministic behavior required by the control components contained in the edge nodes. To validate the effectiveness of our approach, we applied rigorous schedulability analysis to a large set of randomly generated simulated task sets and then verified the most time critical applications, such as the control tasks which maintained low-latency deterministic behavior even during off-nominal conditions. The practicality of our scheduling framework was demonstrated by integrating it into a commercial real-time operating system (VxWorks) then running a typical deep learning image processing application to perform simple object detection. The results indicate that our proposed resource management framework can be leveraged to facilitate integration of machine learning algorithms with real-time operating systems and embedded platforms, including widely-used, industry-standard real-time operating systems

    Flexible Scheduling in Middleware for Distributed rate-based real-time applications - Doctoral Dissertation, May 2002

    Get PDF
    Distributed rate-based real-time systems, such as process control and avionics mission computing systems, have traditionally been scheduled statically. Static scheduling provides assurance of schedulability prior to run-time overhead. However, static scheduling is brittle in the face of unanticipated overload, and treats invocation-to-invocation variations in resource requirements inflexibly. As a consequence, processing resources are often under-utilized in the average case, and the resulting systems are hard to adapt to meet new real-time processing requirements. Dynamic scheduling offers relief from the limitations of static scheduling. However, dynamic scheduling offers relief from the limitations of static scheduling. However, dynamic scheduling often has a high run-time cost because certain decisions are enforced on-line. Furthermore, under conditions of overload tasks can be scheduled dynamically that may never be dispatched, or that upon dispatch would miss their deadlines. We review the implications of these factors on rate-based distributed systems, and posits the necessity to combine static and dynamic approaches to exploit the strengths and compensate for the weakness of either approach in isolation. We present a general hybrid approach to real-time scheduling and dispatching in middleware, that can employ both static and dynamic components. This approach provides (1) feasibility assurance for the most critical tasks, (2) the ability to extend this assurance incrementally to operations in successively lower criticality equivalence classes, (3) the ability to trade off bounds on feasible utilization and dispatching over-head in cases where, for example, execution jitter is a factor or rates are not harmonically related, and (4) overall flexibility to make more optimal use of scarce computing resources and to enforce a wider range of application-specified execution requirements. This approach also meets additional constraints of an increasingly important class of rate-based systems, those with requirements for robust management of real-time performance in the face of rapidly and widely changing operating conditions. To support these requirements, we present a middleware framework that implements the hybrid scheduling and dispatching approach described above, and also provides support for (1) adaptive re-scheduling of operations at run-time and (2) reflective alternation among several scheduling strategies to improve real-time performance in the face of changing operating conditions. Adaptive re-scheduling must be performed whenever operating conditions exceed the ability of the scheduling and dispatching infrastructure to meet the critical real-time requirements of the system under the currently specified rates and execution times of operations. Adaptive re-scheduling relies on the ability to change the rates of execution of at least some operations, and may occur under the control of a higher-level middleware resource manager. Different rates of execution may be specified under different operating conditions, and the number of such possible combinations may be arbitrarily large. Furthermore, adaptive rescheduling may in turn require notification of rate-sensitive application components. It is therefore desirable to handle variations in operating conditions entirely within the scheduling and dispatching infrastructure when possible. A rate-based distributed real-time application, or a higher-level resource manager, could thus fall back on adaptive re-scheduling only when it cannot achieve acceptable real-time performance through self-adaptation. Reflective alternation among scheduling heuristics offers a way to tune real-time performance internally, and we offer foundational support for this approach. In particular, run-time observable information such as that provided by our metrics-feedback framework makes it possible to detect that a given current scheduling heuristic is underperforming the level of service another could provide. Furthermore we present empirical results for our framework in a realistic avionics mission computing environment. This forms the basis for guided adaption. This dissertation makes five contributions in support of flexible and adaptive scheduling and dispatching in middleware. First, we provide a middle scheduling framework that supports arbitrary and fine-grained composition of static/dynamic scheduling, to assure critical timeliness constraints while improving noncritical performance under a range of conditions. Second, we provide a flexible dispatching infrastructure framework composed of fine-grained primitives, and describe how appropriate configurations can be generated automatically based on the output of the scheduling framework. Third, we describe algorithms to reduce the overhead and duration of adaptive rescheduling, based on sorting for rate selection and priority assignment. Fourth, we provide timely and efficient performance information through an optimized metrics-feedback framework, to support higher-level reflection and adaptation decisions. Fifth, we present the results of empirical studies to quantify and evaluate the performance of alternative canonical scheduling heuristics, across a range of load and load jitter conditions. These studies were conducted within an avionics mission computing applications framework running on realistic middleware and embedded hardware. The results obtained from these studies (1) demonstrate the potential benefits of reflective alternation among distinct scheduling heuristics at run-time, and (2) suggest performance factors of interest for future work on adaptive control policies and mechanisms using this framework

    Recognizing Risk in Human Capital Investments: A Real Options Approach to Strategic Human Resource Management

    Get PDF
    An issue that has not yet been explored in the field of strategic human resource management (SHRM) is that of managing the ‘risks’ involved in human capital management of the firm. We address this issue using the real option theory framework. We argue that certain HR practices manage risk and generate opportunities for the firm by creating \u27options\u27 for its human capital management. These HR options help ensure stability of returns from human capital and thus sustain competitive advantage. Different types of HR options and the role of certain HR practices in creation of these options are discussed

    Service Boosters: Library Operating Systems For The Datacenter

    Get PDF
    Cloud applications are taking an increasingly important place our technology and economic landscape. Consequently, they are subject to stringent performance requirements. High tail latency — percentiles at the tail of the response time distribution — is a threat to these requirements. As little as 0.01% slow requests in one microservice can significantly degrade performance for the entire application. The conventional wisdom is that application-awareness is crucial to design optimized performance management systems, but comes at the cost of maneuverability. Consequently, existing execution environments are often general-purpose and ignore important application features such as the architecture of request processing pipelines or the type of requests being served. These one-size-fits-all solutions are missing crucial information to identify and remove sources of high tail latency. This thesis aims to develop a lightweight execution environment exploiting application semantics to optimize tail performance for cloud services. This system, dubbed Service Boosters, is a library operating system exposing application structure and semantics to the underlying resource management stack. Using Service Boosters, programmers use a generic programming model to build, declare and an-notate their request processing pipeline, while performance engineers can program advanced management strategies. Using Service Boosters, I present three systems, FineLame, PersĂ©phone, and DeDoS, that exploit application awareness to provide real time anomaly detection; tail-tolerant RPC scheduling; and resource harvesting. FineLame leverages awareness of the request processing pipeline to deploy monitoring and anomaly detection probes. Using these, FineLame can detect abnormal requests in-flight whenever they depart from the expected behavior and alerts other resource management modules. Pers ́ephone exploits an understanding of request types to dynamically allocate resources to each type and forbid pathological head-of-line blocking from heavy-tailed workloads, without the need for interrupts. Pers ́ephone is a low overhead solution well suited for microsecond scale workloads. Finally, DeDoS can identify overloaded components and dynamically scale them, harvesting only the resources needed to quench the overload. Service Boosters is a powerful framework to handle tail latency in the datacenter. Service Boosters clearly separates the roles of application development and performance engineering, proposing a general purpose application programming model while enabling the development of specialized resource management modules such as PersĂ©phone and DeDoS
    • 

    corecore