47 research outputs found

    Adaptive partitioning of real-time tasks on multiple processors

    Get PDF
    This paper presents a new algorithm for scheduling real-time tasks on multiprocessor/multicore systems. This new algorithm is based on combining EDF scheduling with a migration strategy that moves tasks only when needed. It has been evaluated through an extensive set of simulations that showed good performance when compared with global or partitioned EDF: a worst-case utilisation bound similar to partitioned EDF for hard real-time tasks, and a tardiness bound similar to global EDF for soft real-time tasks. Therefore, the proposed scheduler is effective for dealing with both soft and hard real-time workloads

    Space sharing job scheduling policies for parallel computers

    Get PDF
    The distinguishing characteristic of space sharing parallel job scheduling policies is that applications are allocated non-overlapping processor subsets. The interference among jobs is reduced, the synchronization delays and message latencies can be predictable, and distinct processors may be allocated to cooperating processes so as to avoid the overhead of context switches associated with traditional time-multiplexing;The processor allocation strategy, the job selection criteria, and workload characteristics are fundamental factors that influence system performance under space sharing. Allocation can be static or dynamic. The processor subset allocated to an application is fixed under static space sharing, whereas it can change during execution under dynamic space sharing. Static allocation can produce more predictable run times, permits a wide range of compiler optimizations (e.g., static data distribution and binding), and avoids the processor releases and reallocations associated with dynamic allocation. Its major problem is that it can induce high processor fragmentation;In this dissertation, alternative static and dynamic space sharing policies that differ in the allocation discipline and the job selection criteria are studied. The results show that significantly superior performance can be achieved under static space sharing if applications can be folded (i.e., allocated fewer processors than they requested). Folding typically increases program efficiency and can reduce processor fragmentation. Policies that increase folding with the system load are proposed and compared to schemes that use unconstrained folding, no folding, and fixed maximum folding factors. The adaptive policies produced higher and more stable system utilization, significantly shorter mean response times, and good fairness curves. However, unconstrained folding resulted in considerably more severe processor fragmentation than no folding. Its advantage is that it exploits the efficiency improvement that typically results when an application is allocated fewer processors. Consequently, it can produce shorter mean response times than no folding under medium to heavy loads;Also because of this efficiency improvement, dynamic policies that reduce waiting times by executing a large number of jobs simultaneously are more promising than schemes that limit the number of active jobs. However, limiting the number of active applications can be the superior approach when folding does not improve application efficiency

    Complex scheduling models and analyses for property-based real-time embedded systems

    Get PDF
    Modern multi core architectures and parallel applications pose a significant challenge to the worst-case centric real-time system verification and design efforts. The involved model and parameter uncertainty contest the fidelity of formal real-time analyses, which are mostly based on exact model assumptions. In this dissertation, various approaches that can accept parameter and model uncertainty are presented. In an attempt to improve predictability in worst-case centric analyses, the exploration of timing predictable protocols are examined for parallel task scheduling on multiprocessors and network-on-chip arbitration. A novel scheduling algorithm, called stationary rigid gang scheduling, for gang tasks on multiprocessors is proposed. In regard to fixed-priority wormhole-switched network-on-chips, a more restrictive family of transmission protocols called simultaneous progression switching protocols is proposed with predictability enhancing properties. Moreover, hierarchical scheduling for parallel DAG tasks under parameter uncertainty is studied to achieve temporal- and spatial isolation. Fault-tolerance as a supplementary reliability aspect of real-time systems is examined, in spite of dynamic external causes of fault. Using various job variants, which trade off increased execution time demand with increased error protection, a state-based policy selection strategy is proposed, which provably assures an acceptable quality-of-service (QoS). Lastly, the temporal misalignment of sensor data in sensor fusion applications in cyber-physical systems is examined. A modular analysis based on minimal properties to obtain an upper-bound for the maximal sensor data time-stamp difference is proposed

    REQUIREMENT- AWARE STRATEGIES FOR SCHEDULING MULTIPLE DIVISIBLE LOADS IN CLUSTER ENVIRONMENTS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Control techniques for thermal-aware energy-efficient real time multiprocessor scheduling

    Get PDF
    La utilización de microprocesadores multinúcleo no sólo es atractiva para la industria sino que en muchos ámbitos es la única opción. La planificación tiempo real sobre estas plataformas es mucho más compleja que sobre monoprocesadores y en general empeoran el problema de sobre-diseño, llevando a la utilización de muchos más procesadores /núcleos de los necesarios. Se han propuesto algoritmos basados en planificación fluida que optimizan la utilización de los procesadores, pero hasta el momento presentan en general inconvenientes que los alejan de su aplicación práctica, no siendo el menor el elevado número de cambios de contexto y migraciones.Esta tesis parte de la hipótesis de que es posible diseñar algoritmos basados en planificación fluida, que optimizan la utilización de los procesadores, cumpliendo restricciones temporales, térmicas y energéticas, con un bajo número de cambios de contexto y migraciones, y compatibles tanto con la generación fuera de línea de ejecutivos cíclicos atractivos para la industria, como de planificadores que integran técnicas de control en tiempo de ejecución que permiten la gestión eficiente tanto de tareas aperiódicas como de desviaciones paramétricas o pequeñas perturbaciones.A este respecto, esta tesis contribuye con varias soluciones. En primer lugar, mejora una metodología de modelo que representa todas las dimensiones del problema bajo un único formalismo (Redes de Petri Continuas Temporizadas). En segundo lugar, propone un método de generación de un ejecutivo cíclico, calculado en ciclos de procesador, para un conjunto de tareas tiempo real duro sobre multiprocesadores que optimiza la utilización de los núcleos de procesamiento respetando también restricciones térmicas y de energía, sobre la base de una planificación fluida. Considerar la sobrecarga derivada del número de cambios de contexto y migraciones en un ejecutivo cíclico plantea un dilema de causalidad: el número de cambios de contexto (y en consecuencia su sobrecarga) no se conoce hasta generar el ejecutivo cíclico, pero dicho número no se puede minimizar hasta que se ha calculado. La tesis propone una solución a este dilema mediante un método iterativo de convergencia demostrada que logra minimizar la sobrecarga mencionada.En definitiva, la tesis consigue explotar la idea de planificación fluida para maximizar la utilización (donde maximizar la utilización es un gran problema en la industria) generando un sencillo ejecutivo cíclico de mínima sobrecarga (ya que la sobrecarga implica un gran problema de los planificadores basados en planificación fluida).Finalmente, se propone un método para utilizar las referencias de la planificación fuera de línea establecida en el ejecutivo cíclico para su seguimiento por parte de un controlador de frecuencia en línea, de modo que se pueden afrontar pequeñas perturbaciones y variaciones paramétricas, integrando la gestión de tareas aperiódicas (tiempo real blando) mientras se asegura la integridad de la ejecución del conjunto de tiempo real duro.Estas aportaciones constituyen una novedad en el campo, refrendada por las publicaciones derivadas de este trabajo de tesis.<br /

    Scaling Up Concurrent Analytical Workloads on Multi-Core Servers

    Get PDF
    Today, an ever-increasing number of researchers, businesses, and data scientists collect and analyze massive amounts of data in database systems. The database system needs to process the resulting highly concurrent analytical workloads by exploiting modern multi-socket multi-core processor systems with non-uniform memory access (NUMA) architectures and increasing memory sizes. Conventional execution engines, however, are not designed for many cores, and neither scale nor perform efficiently on modern multi-core NUMA architectures. Firstly, their query-centric approach, where each query is optimized and evaluated independently, can result in unnecessary contention for hardware resources due to redundant work found across queries in highly concurrent workloads. Secondly, they are unaware of the non-uniform memory access costs and the underlying hardware topology, incurring unnecessarily expensive memory accesses and bandwidth saturation. In this thesis, we show how these scalability and performance impediments can be solved by exploiting sharing among concurrent queries and incorporating NUMA-aware adaptive task scheduling and data placement strategies in the execution engine. Regarding sharing, we identify and categorize state-of-the-art techniques for sharing data and work across concurrent queries at run-time into two categories: reactive sharing, which shares intermediate results across common query sub-plans, and proactive sharing, which builds a global query plan with shared operators to evaluate queries. We integrate the original research prototypes that introduce reactive and proactive sharing, perform a sensitivity analysis, and show how and when each technique benefits performance. Our most significant finding is that reactive and proactive sharing can be combined to exploit the advantages of both sharing techniques for highly concurrent analytical workloads. Regarding NUMA-awareness, we identify, implement, and compare various combinations of task scheduling and data placement strategies under a diverse set of highly concurrent analytical workloads. We develop a prototype based on a commercial main-memory column-store database system. Our most significant finding is that there is no single strategy for task scheduling and data placement that is best for all workloads. In specific, inter-socket stealing of memory-intensive tasks can hurt overall performance, and unnecessary partitioning of data across sockets involves an overhead. For this reason, we implement algorithms that adapt task scheduling and data placement to the workload at run-time. Our experiments show that both sharing and NUMA-awareness can significantly improve the performance and scalability of highly concurrent analytical workloads on modern multi-core servers. Thus, we argue that sharing and NUMA-awareness are key factors for supporting faster processing of big data analytical applications, fully exploiting the hardware resources of modern multi-core servers, and for more responsive user experience

    Characterization of real-time computers

    Get PDF
    A real-time system consists of a computer controller and controlled processes. Despite the synergistic relationship between these two components, they have been traditionally designed and analyzed independently of and separately from each other; namely, computer controllers by computer scientists/engineers and controlled processes by control scientists. As a remedy for this problem, in this report real-time computers are characterized by performance measures based on computer controller response time that are: (1) congruent to the real-time applications, (2) able to offer an objective comparison of rival computer systems, and (3) experimentally measurable/determinable. These measures, unlike others, provide the real-time computer controller with a natural link to controlled processes. In order to demonstrate their utility and power, these measures are first determined for example controlled processes on the basis of control performance functionals. They are then used for two important real-time multiprocessor design applications - the number-power tradeoff and fault-masking and synchronization

    New contributions to spatial partitioning and parallel global illumination algorithms

    Get PDF
    Diese Dissertation ist an der Schnittstelle zweier Disziplinen der Informatik angesiedelt: Computergrafik (Globale Beleuchtung) und Paralleles Rechnen (Dynamisches Partitionieren). Einerseits wird der Hierarchische Radiosity Algorithmus (HRA) - ein berühmter und effizienter Algorithmus zur globalen Beleuchtungssimulation - bzgl. seiner Parallelisierungsfähigkeit untersucht. Andererseits wird ein Werkzeug aus der Gattung der orthogonalen rekursiven Zweiteilungsverfahren zur dynamischen Partitionierung räumlich abgebildeter Aufgaben entwickelt sowie theoretisch und experimentell analysiert. Der HRA ist eine spezielle Instanz von Algorithmen, die als eine Ansammlung von räumlich abgebildeten Aufgaben formuliert werden können. Als Beweis der Praktikabilität unseres Werkzeugs wenden wir das Werkzeug auf den HRA an und beobachten ein gut skalierbares Verhalten und nützliche Werte bzgl. der Steigerung der Berechnungsgeschwindigkeit.This thesis resides around the interface of two disciplines in computer science: computer graphics (global illumination) and parallel computing (dynamic partitioning). On the one hand the hierarchical radiosity algorithm (HRA) - a famous and efficient global illumination algorithm - is examined with respect to its capability of being parallelized. On the other hand a dynamic orthogonal recursive bisection tool for the dynamic partitioning of spatially mapped tasks is developped and analyzed theoretically and experimentally. The HRA is a special instance of algorithms that can be formulated as a collection of spatially mapped tasks. As a proof of practicability of our tool we apply the tool to the HRA and observe a well scalable behaviour and useful speedup values
    corecore