2,210 research outputs found

    Scheduling with communication for multiprocessor computation

    Get PDF
    Multiprocessor scheduling houdt zich bezig met de planning van de uitvoering van computer-programma s op een parallelle computer. Een computerprogramma kan worden gezien als een collectie instructies die gegroepeerd zijn in taken. Een parallelle computer is een computer met meerdere processoren die verbonden zijn door een communicatie-netwerk. Elke processor kan taken van een computerprogramma uitvoeren. Tijdens de uitvoering van een computerprogramma op een parallelle computer wordt elke taak ´ e´ en maal uitgevoerd. In het algemeen kunnen de taken van een computerprogramma niet in een willekeurige volgorde worden uitgevoerd: het resultaat van een taak kan nodig zijn om een andere taak uit te voeren. Zulke taken worden data-afhankelijk genoemd. De data-afhankelijkheden defini¨ eren de structuur van het computerprogramma: als taak u2 het resultaat van taak u1 nodig heeft, dan kan u2 pas worden uitgevoerd nadat u1 is voltooid. Als er geen data-afhankelijkheid bestaat tussen twee taken, dan kunnen ze in willekeurige volgorde of tege-lijkertijd worden uitgevoerd. Als twee data-afhankelijke taken u1 en u2 op verschillende processoren worden uitgevoerd, dan moet het resultaat van u1 naar de processor die u2 uitvoert worden overgebracht. Dit transport van informatie wordt communicatie genoemd. Het resultaat van u1 kan naar een andere processor worden overgebracht door het sturen van berichten door het communicatie-netwerk. Een schedule geeft voor elke taak aan welke processor hem uitvoert en op welk tijdstip. Het doel van multiprocessor scheduling is het construeren van een schedule van zo kort mogelijke duur, rekening houdend met de communicatie veroorzaakt door de data-afhankelijkheden tussen de taken. De duur van een schedule wordt in grote mate bepaald door de hoeveelheid communi-catie in het schedule: de duur van een schedule kan toenemen doordat een processor lange tijd geen taken kan uitvoeren, omdat hij staat te wachten op het resultaat van een taak die op een andere processor wordt uitgevoerd. Omdat de wijze waarop processoren van parallelle computers communiceren verschilt per computer, is het uiterst moeilijk om op effici¨ ente wijze goede schedules te construeren voor een computerprogramma op een parallelle computer. Daarom wordt in het algemeen een model van een parallelle computer gebruikt in plaats een echte parallelle computer. Zo n model wordt een parallel berekeningsmodel genoemd. In een parallel berekeningsmodel kan men zich concen-treren op die aspecten van communicatie die een grote invloed hebben op de kwaliteit van een schedule. Dit geeft de mogelijkheid deze aspecten beter te begrijpen. In dit proefschrift worden twee parallelle berekeningsmodellen beschouwd: het UCT model en het LogP model. Het UCT model richt zich op het bestuderen van ´ e´ en aspect van commu-nicatie: een tijdvertraging die nodig is om resultaten tussen processoren te transporteren. Het LogP model is een model dat meerdere aspecten van communicatie in acht neemt: door middel van een geschikt gekozen invulling van zijn parameters L, o, g en P kan het LogP model de communicatie in vele parallelle computers modelleren. Communicatie in het UCT model werkt als volgt. Als taak u2 het resultaat van taak u1 nodig heeft en deze taken zijn op verschillende processoren uitgevoerd, dan moet er een vertraging van tenminste ´ e´ en tijdstap zijn tussen de tijd waarop u1 wordt voltooid en de tijd waarop u2 start. 171?Deze vertraging is nodig om het resultaat van u1 naar de processor die u2 uitvoert te sturen. Als u1 en u2 op dezelfde processor worden uitgevoerd, dan is het resultaat van u1 al op de juiste processor beschikbaar en is er geen vertraging nodig. In dat geval kan u2 direct na u1 worden uitgevoerd. Communicatie in het LogP model is veel ingewikkelder. Beschouw wederom twee data-afhankelijke taken u1 en u2 die op verschillende processoren worden uitgevoerd. Neem aan dat het resultaat van u1 moet worden getransporteerd naar de processor die u2 uitvoert. In vele gevallen kan het transporteren van het resultaat van een taak niet met ´ e´ en bericht, maar zijn meerdere berichten nodig. Deze moeten naar de processor die u2 uitvoert worden gestuurd. Het versturen van ´ e´ en bericht kost o tijdstappen op de processor die u1 uitvoert; het ontvangen ervan kost o tijdstappen op de processor die u2 uitvoert. Daarnaast kan elke processor ten hoogste ´ e´ en bericht versturen of ontvangen in elke g opeenvolgende tijdstappen en is er een vertraging van precies L tijdstappen tussen het versturen en het ontvangen van een bericht. In het eerste deel van dit proefschrift (hoofdstukken 3, 4, 5, 6 en 7) worden algoritmen be-schreven die op effici¨ ente wijze schedules in het UCT model construeren. In hoofdstuk 4 wordt een algoritme beschreven dat goede schedules construeert voor willekeurige computerprogram-ma s. Voor computerprogramma s met een outforest-structuur construeert dit algoritme optimale schedules. In hoofdstuk 5 beschrijven we algoritmen die goede schedules construeren voor com-puterprogramma s met een inforest-structuur. De algoritmen die worden beschreven in hoofd-stukken 6 en 7 construeren optimale schedules voor computerprogramma s waarin het maximum aantal paarsgewijs data-onafhankelijke taken klein is en voor computerprogramma s met een in-terval order-structuur. Het tweede deel van dit proefschrift (hoofdstukken 8, 9, 10 en 11) houdt zich bezig met scheduling in het LogP model. In hoofdstukken 9 en 10 bewijzen we dat het construeren van optimale schedules voor computerprogramma s met een zeer eenvoudige boomstructuur (send graph-structuur of receive graph-structuur) waarschijnlijk niet op effici¨ ente wijze mogelijk is. In deze hoofdstukken worden effici¨ ente algoritmen beschreven die goede (maar niet noodzake-lijk optimale) schedules construeren voor computerprogramma s met een dergelijke structuur. In hoofdstuk 11 worden decompositie-algoritmen gebruikt om op effici¨ ente wijze schedules te construeren voor computerprogramma s met een algemene boomstructuur. Het blijkt dat optimale schedules in het UCT model op effici¨ ente wijze kunnen worden ge-construeerd als de structuur van de computerprogramma s eenvoudig is (bijvoorbeeld computer-programma s met een boomstructuur). De eenvoudige aard van de communicatie in het UCT model maakt dit mogelijk. Vandaar dat de complexiteit van scheduling in het UCT model met name bepaald wordt door de structuur van de computerprogramma s. Daarentegen maakt de communicatie het moeilijk om goede schedules in het LogP model te construeren, zelfs als de structuur van de computerprogramma s zeer eenvoudig is (bijvoorbeeld computerprogramma s met een send graph-structuur). Hieruit blijkt dat de complexiteit van scheduling in het LogP model in grote mate wordt bepaald door de ingewikkelde vorm van communicatie in dit model. 17

    Generalizing List Scheduling for Stochastic Soft Real-time Parallel Applications

    Get PDF
    Advanced architecture processors provide features such as caches and branch prediction that result in improved, but variable, execution time of software. Hard real-time systems require tasks to complete within timing constraints. Consequently, hard real-time systems are typically designed conservatively through the use of tasks? worst-case execution times (WCET) in order to compute deterministic schedules that guarantee task?s execution within giving time constraints. This use of pessimistic execution time assumptions provides real-time guarantees at the cost of decreased performance and resource utilization. In soft real-time systems, however, meeting deadlines is not an absolute requirement (i.e., missing a few deadlines does not severely degrade system performance or cause catastrophic failure). In such systems, a guaranteed minimum probability of completing by the deadline is sufficient. Therefore, there is considerable latitude in such systems for improving resource utilization and performance as compared with hard real-time systems, through the use of more realistic execution time assumptions. Given probability distribution functions (PDFs) representing tasks? execution time requirements, and tasks? communication and precedence requirements, represented as a directed acyclic graph (DAG), this dissertation proposes and investigates algorithms for constructing non-preemptive stochastic schedules. New PDF manipulation operators developed in this dissertation are used to compute tasks? start and completion time PDFs during schedule construction. PDFs of the schedules? completion times are also computed and used to systematically trade the probability of meeting end-to-end deadlines for schedule length and jitter in task completion times. Because of the NP-hard nature of the non-preemptive DAG scheduling problem, the new stochastic scheduling algorithms extend traditional heuristic list scheduling and genetic list scheduling algorithms for DAGs by using PDFs instead of fixed time values for task execution requirements. The stochastic scheduling algorithms also account for delays caused by communication contention, typically ignored in prior DAG scheduling research. Extensive experimental results are used to demonstrate the efficacy of the new algorithms in constructing stochastic schedules. Results also show that through the use of the techniques developed in this dissertation, the probability of meeting deadlines can be usefully traded for performance and jitter in soft real-time systems

    Parametric Schedulability Analysis of Fixed Priority Real-Time Distributed Systems

    Get PDF
    Parametric analysis is a powerful tool for designing modern embedded systems, because it permits to explore the space of design parameters, and to check the robustness of the system with respect to variations of some uncontrollable variable. In this paper, we address the problem of parametric schedulability analysis of distributed real-time systems scheduled by fixed priority. In particular, we propose two different approaches to parametric analysis: the first one is a novel technique based on classical schedulability analysis, whereas the second approach is based on model checking of Parametric Timed Automata (PTA). The proposed analytic method extends existing sensitivity analysis for single processors to the case of a distributed system, supporting preemptive and non-preemptive scheduling, jitters and unconstrained deadlines. Parametric Timed Automata are used to model all possible behaviours of a distributed system, and therefore it is a necessary and sufficient analysis. Both techniques have been implemented in two software tools, and they have been compared with classical holistic analysis on two meaningful test cases. The results show that the analytic method provides results similar to classical holistic analysis in a very efficient way, whereas the PTA approach is slower but covers the entire space of solutions.Comment: Submitted to ECRTS 2013 (http://ecrts.eit.uni-kl.de/ecrts13

    Energy-Centric Scheduling for Real-Time Systems

    Get PDF
    Energy consumption is today an important design issue for all kinds of digital systems, and essential for the battery operated ones. An important fraction of this energy is dissipated on the processors running the application software. To reduce this energy consumption, one may, for instance, lower the processor clock frequency and supply voltage. This, however, might lead to a performance degradation of the whole system. In real-time systems, the crucial issue is timing, which is directly dependent on the system speed. Real-time scheduling and energy efficiency are therefore tightly connected issues, being addressed together in this work. Several scheduling approaches for low energy are described in the thesis, most targeting variable speed processor architectures. At task level, a novel speed scheduling algorithm for tasks with probabilistic execution pattern is introduced and compared to an already existing compile-time approach. For task graphs, a list-scheduling based algorithm with an energy-sensitive priority is proposed. For task sets, off-line methods for computing the task maximum required speeds are described, both for rate-monotonic and earliest deadline first scheduling. Also, a run-time speed optimization policy based on slack re-distribution is proposed for rate-monotonic scheduling. Next, an energy-efficient extension of the earliest deadline first priority assignment policy is proposed, aimed at tasks with probabilistic execution time. Finally, scheduling is examined in conjunction with assignment of tasks to processors, as parts of various low energy design flows. For some of the algorithms given in the thesis, energy measurements were carried out on a real hardware platform containing a variable speed processor. The results confirm the validity of the initial assumptions and models used throughout the thesis. These experiments also show the efficiency of the newly introduced scheduling methods

    Constant bandwidth servers with constrained deadlines

    Get PDF
    The Hard Constant Bandwidth Server (H-CBS) is a reservation-based scheduling algorithm often used to mix hard and soft real-time tasks on the same system. A number of variants of the H-CBS algorithm have been proposed in the last years, but all of them have been conceived for implicit server deadlines (i.e., equal to the server period). However, recent promising results on semi-partitioned scheduling together with the demand for new functionality claimed by the Linux community, urge the need for a reservation algorithm that is able to work with constrained deadlines. This paper presents three novel H-CBS algorithms that support constrained deadlines. The three algorithms are formally analyzed, and their performance are compared through an extensive set of simulations

    Real-Time Wireless Sensor-Actuator Networks for Cyber-Physical Systems

    Get PDF
    A cyber-physical system (CPS) employs tight integration of, and coordination between computational, networking, and physical elements. Wireless sensor-actuator networks provide a new communication technology for a broad range of CPS applications such as process control, smart manufacturing, and data center management. Sensing and control in these systems need to meet stringent real-time performance requirements on communication latency in challenging environments. There have been limited results on real-time scheduling theory for wireless sensor-actuator networks. Real-time transmission scheduling and analysis for wireless sensor-actuator networks requires new methodologies to deal with unique characteristics of wireless communication. Furthermore, the performance of a wireless control involves intricate interactions between real-time communication and control. This thesis research tackles these challenges and make a series of contributions to the theory and system for wireless CPS. (1) We establish a new real-time scheduling theory for wireless sensor-actuator networks. (2) We develop a scheduling-control co-design approach for holistic optimization of control performance in a wireless control system. (3) We design and implement a wireless sensor-actuator network for CPS in data center power management. (4) We expand our research to develop scheduling algorithms and analyses for real-time parallel computing to support computation-intensive CPS

    Real-time operating system support for multicore applications

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2014Plataformas multiprocessadas atuais possuem diversos níveis da memória cache entre o processador e a memória principal para esconder a latência da hierarquia de memória. O principal objetivo da hierarquia de memória é melhorar o tempo médio de execução, ao custo da previsibilidade. O uso não controlado da hierarquia da cache pelas tarefas de tempo real impacta a estimativa dos seus piores tempos de execução, especialmente quando as tarefas de tempo real acessam os níveis da cache compartilhados. Tal acesso causa uma disputa pelas linhas da cache compartilhadas e aumenta o tempo de execução das aplicações. Além disso, essa disputa na cache compartilhada pode causar a perda de prazos, o que é intolerável em sistemas de tempo real críticos. O particionamento da memória cache compartilhada é uma técnica bastante utilizada em sistemas de tempo real multiprocessados para isolar as tarefas e melhorar a previsibilidade do sistema. Atualmente, os estudos que avaliam o particionamento da memória cache em multiprocessadores carecem de dois pontos fundamentais. Primeiro, o mecanismo de particionamento da cache é tipicamente implementado em um ambiente simulado ou em um sistema operacional de propósito geral. Consequentemente, o impacto das atividades realizados pelo núcleo do sistema operacional, tais como o tratamento de interrupções e troca de contexto, no particionamento das tarefas tende a ser negligenciado. Segundo, a avaliação é restrita a um escalonador global ou particionado, e assim não comparando o desempenho do particionamento da cache em diferentes estratégias de escalonamento. Ademais, trabalhos recentes confirmaram que aspectos da implementação do SO, tal como a estrutura de dados usada no escalonamento e os mecanismos de tratamento de interrupções, impactam a escalonabilidade das tarefas de tempo real tanto quanto os aspectos teóricos. Entretanto, tais estudos também usaram sistemas operacionais de propósito geral com extensões de tempo real, que afetamos sobre custos de tempo de execução observados e a escalonabilidade das tarefas de tempo real. Adicionalmente, os algoritmos de escalonamento tempo real para multiprocessadores atuais não consideram cenários onde tarefas de tempo real acessam as mesmas linhas da cache, o que dificulta a estimativa do pior tempo de execução. Esta pesquisa aborda os problemas supracitados com as estratégias de particionamento da cache e com os algoritmos de escalonamento tempo real multiprocessados da seguinte forma. Primeiro, uma infraestrutura de tempo real para multiprocessadores é projetada e implementada em um sistema operacional embarcado. A infraestrutura consiste em diversos algoritmos de escalonamento tempo real, tais como o EDF global e particionado, e um mecanismo de particionamento da cache usando a técnica de coloração de páginas. Segundo, é apresentada uma comparação em termos da taxa de escalonabilidade considerando o sobre custo de tempo de execução da infraestrutura criada e de um sistema operacional de propósito geral com extensões de tempo real. Em alguns casos, o EDF global considerando o sobre custo do sistema operacional embarcado possui uma melhor taxa de escalonabilidade do que o EDF particionado com o sobre custo do sistema operacional de propósito geral, mostrando claramente como diferentes sistemas operacionais influenciam os escalonadores de tempo real críticos em multiprocessadores. Terceiro, é realizada uma avaliação do impacto do particionamento da memória cache em diversos escalonadores de tempo real multiprocessados. Os resultados desta avaliação indicam que um sistema operacional "leve" não compromete as garantias de tempo real e que o particionamento da cache tem diferentes comportamentos dependendo do escalonador e do tamanho do conjunto de trabalho das tarefas. Quarto, é proposto um algoritmo de particionamento de tarefas que atribui as tarefas que compartilham partições ao mesmo processador. Os resultados mostram que essa técnica de particionamento de tarefas reduz a disputa pelas linhas da cache compartilhadas e provê garantias de tempo real para sistemas críticos. Finalmente, é proposto um escalonador de tempo real de duas fases para multiprocessadores. O escalonador usa informações coletadas durante o tempo de execução das tarefas através dos contadores de desempenho em hardware. Com base nos valores dos contadores, o escalonador detecta quando tarefas de melhor esforço o interferem com tarefas de tempo real na cache. Assim é possível impedir que tarefas de melhor esforço acessem as mesmas linhas da cache que tarefas de tempo real. O resultado desta estratégia de escalonamento é o atendimento dos prazos críticos e não críticos das tarefas de tempo real.Abstracts: Modern multicore platforms feature multiple levels of cache memory placed between the processor and main memory to hide the latency of ordinary memory systems. The primary goal of this cache hierarchy is to improve average execution time (at the cost of predictability). The uncontrolled use of the cache hierarchy by realtime tasks may impact the estimation of their worst-case execution times (WCET), specially when real-time tasks access a shared cache level, causing a contention for shared cache lines and increasing the application execution time. This contention in the shared cache may leadto deadline losses, which is intolerable particularly for hard real-time (HRT) systems. Shared cache partitioning is a well-known technique used in multicore real-time systems to isolate task workloads and to improve system predictability. Presently, the state-of-the-art studies that evaluate shared cache partitioning on multicore processors lack two key issues. First, the cache partitioning mechanism is typically implemented either in a simulated environment or in a general-purpose OS (GPOS), and so the impact of kernel activities, such as interrupt handlers and context switching, on the task partitions tend to be overlooked. Second, the evaluation is typically restricted to either a global or partitioned scheduler, thereby by falling to compare the performance of cache partitioning when tasks are scheduled by different schedulers. Furthermore, recent works have confirmed that OS implementation aspects, such as the choice of scheduling data structures and interrupt handling mechanisms, impact real-time schedulability as much as scheduling theoretic aspects. However, these studies also used real-time patches applied into GPOSes, which affects the run-time overhead observed in these works and consequently the schedulability of real-time tasks. Additionally, current multicore scheduling algorithms do not consider scenarios where real-time tasks access the same cache lines due to true or false sharing, which also impacts the WCET. This thesis addresses these aforementioned problems with cache partitioning techniques and multicore real-time scheduling algorithms as following. First, a real-time multicore support is designed and implemented on top of an embedded operating system designed from scratch. This support consists of several multicore real-time scheduling algorithms, such as global and partitioned EDF, and a cache partitioning mechanism based on page coloring. Second, it is presented a comparison in terms of schedulability ratio considering the run-time overhead of the implemented RTOS and a GPOS patched with real-time extensions. In some cases, Global-EDF considering the overhead of the RTOS is superior to Partitioned-EDF considering the overhead of the patched GPOS, which clearly shows how different OSs impact hard realtime schedulers. Third, an evaluation of the cache partitioning impacton partitioned, clustered, and global real-time schedulers is performed.The results indicate that a lightweight RTOS does not impact real-time tasks, and shared cache partitioning has different behavior depending on the scheduler and the task's working set size. Fourth, a task partitioning algorithm that assigns tasks to cores respecting their usage of cache partitions is proposed. The results show that by simply assigning tasks that shared cache partitions to the same processor, it is possible to reduce the contention for shared cache lines and to provideHRT guarantees. Finally, a two-phase multicore scheduler that provides HRT and soft real-time (SRT) guarantees is proposed. It is shown that by using information from hardware performance counters at run-time, the RTOS can detect when best-effort tasks interfere with real-time tasks in the shared cache. Then, the RTOS can prevent best effort tasks from interfering with real-time tasks. The results also show that the assignment of exclusive partitions to HRT tasks together with the two-phase multicore scheduler provides HRT and SRT guarantees, even when best-effort tasks share partitions with real-time tasks

    Fast methods for scheduling with applications to real-time systems and large-scale, robotic manufacturing of aerospace structures

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 113-117).Across the aerospace and automotive manufacturing industries, there is a push to remove the cage around large, industrial robots and integrate right-sized, safe versions into the human labor force. By integrating robots into the labor force, humans can be freed to focus on value-added tasks (e.g. dexterous assembly) while the robots perform the non-value-added tasks (e.g. fetching parts). For this integration to be successful, the robots need to ability to reschedule their tasks online in response to unanticipated changes in the parameters of the manufacturing process. The problem of task allocation and scheduling is NP-Hard. To achieve good scalability characteristics, prior approaches to autonomous task allocation and scheduling use decomposition and distributed techniques. These methods work well for domains such as UAV scheduling when the temporospatial constraints can be decoupled or when low network bandwidth makes inter-agent communication difficult. However, the advantages of these methods are mitigated in the factory setting where the temporospatial constraints are tightly inter-coupled from the humans and robots working in close proximity and where there is sufficient network bandwidth. In this thesis, I present a system, called Tercio, that solves large-scale scheduling problems by combining mixed-integer linear programming to perform the agent allocation and a real-time scheduling simulation to sequence the task set. Tercio generates near optimal schedules for 10 agents and 500 work packages in less than 20 seconds on average and has been demonstrated in a multi-robot hardware test bed. My primary technical contributions are fast, near-optimal, real-time systems methods for scheduling and testing the schedulability of task sets. I also present a pilot study that investigates what level of control the Tercio should give human workers over their robotic teammates to maximize system efficiency and human satisfaction.by Matthew C. Gombolay.S.M
    • …
    corecore