361 research outputs found

    Real-time systems on multicore platforms: managing hardware resources for predictable execution

    Full text link
    Shared hardware resources in commodity multicore processors are subject to contention from co-running threads. The resultant interference can lead to highly-variable performance for individual applications. This is particularly problematic for real-time applications, which require predictable timing guarantees. It also leads to a pessimistic estimate of the Worst Case Execution Time (WCET) for every real-time application. More CPU time needs to be reserved, thus less applications can enter the system. As the average execution time is usually far less than the WCET, a significant amount of reserved CPU resource would be wasted. Previous works have attempted partitioning the shared resources, amongst either CPUs or processes, to improve performance isolation. However, they have not proven to be both efficient and effective. In this thesis, we propose several mechanisms and frameworks that manage the shared caches and memory buses on multicore platforms. Firstly, we introduce a multicore real-time scheduling framework with the foreground/background scheduling model. Combining real-time load balancing with background scheduling, CPU utilization is greatly improved. Besides, a memory bus management mechanism is implemented on top of the background scheduling, making sure bus contention is under control while utilizing unused CPU cycles. Also, cache partitioning is thoroughly studied in this thesis, with a cache-aware load balancing algorithm and a dynamic cache partitioning framework proposed. Lastly, we describe a system architecture to integrate the above solutions all together. It tackles one of the toughest problems in OS innovation, legacy support, by converting existing OSes into libraries in a virtualized environment. Thus, within a single multicore platform, we benefit from the fine-grained resource control of a real-time OS and the richness of functionality of a general-purpose OS

    Emerging research directions in computer science : contributions from the young informatics faculty in Karlsruhe

    Get PDF
    In order to build better human-friendly human-computer interfaces, such interfaces need to be enabled with capabilities to perceive the user, his location, identity, activities and in particular his interaction with others and the machine. Only with these perception capabilities can smart systems ( for example human-friendly robots or smart environments) become posssible. In my research I\u27m thus focusing on the development of novel techniques for the visual perception of humans and their activities, in order to facilitate perceptive multimodal interfaces, humanoid robots and smart environments. My work includes research on person tracking, person identication, recognition of pointing gestures, estimation of head orientation and focus of attention, as well as audio-visual scene and activity analysis. Application areas are humanfriendly humanoid robots, smart environments, content-based image and video analysis, as well as safety- and security-related applications. This article gives a brief overview of my ongoing research activities in these areas

    Real-time operating system support for multicore applications

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro TecnolĂłgico, Programa de PĂłs-Graduação em Engenharia de Automação e Sistemas, FlorianĂłpolis, 2014Plataformas multiprocessadas atuais possuem diversos nĂ­veis da memĂłria cache entre o processador e a memĂłria principal para esconder a latĂȘncia da hierarquia de memĂłria. O principal objetivo da hierarquia de memĂłria Ă© melhorar o tempo mĂ©dio de execução, ao custo da previsibilidade. O uso nĂŁo controlado da hierarquia da cache pelas tarefas de tempo real impacta a estimativa dos seus piores tempos de execução, especialmente quando as tarefas de tempo real acessam os nĂ­veis da cache compartilhados. Tal acesso causa uma disputa pelas linhas da cache compartilhadas e aumenta o tempo de execução das aplicaçÔes. AlĂ©m disso, essa disputa na cache compartilhada pode causar a perda de prazos, o que Ă© intolerĂĄvel em sistemas de tempo real crĂ­ticos. O particionamento da memĂłria cache compartilhada Ă© uma tĂ©cnica bastante utilizada em sistemas de tempo real multiprocessados para isolar as tarefas e melhorar a previsibilidade do sistema. Atualmente, os estudos que avaliam o particionamento da memĂłria cache em multiprocessadores carecem de dois pontos fundamentais. Primeiro, o mecanismo de particionamento da cache Ă© tipicamente implementado em um ambiente simulado ou em um sistema operacional de propĂłsito geral. Consequentemente, o impacto das atividades realizados pelo nĂșcleo do sistema operacional, tais como o tratamento de interrupçÔes e troca de contexto, no particionamento das tarefas tende a ser negligenciado. Segundo, a avaliação Ă© restrita a um escalonador global ou particionado, e assim nĂŁo comparando o desempenho do particionamento da cache em diferentes estratĂ©gias de escalonamento. Ademais, trabalhos recentes confirmaram que aspectos da implementação do SO, tal como a estrutura de dados usada no escalonamento e os mecanismos de tratamento de interrupçÔes, impactam a escalonabilidade das tarefas de tempo real tanto quanto os aspectos teĂłricos. Entretanto, tais estudos tambĂ©m usaram sistemas operacionais de propĂłsito geral com extensĂ”es de tempo real, que afetamos sobre custos de tempo de execução observados e a escalonabilidade das tarefas de tempo real. Adicionalmente, os algoritmos de escalonamento tempo real para multiprocessadores atuais nĂŁo consideram cenĂĄrios onde tarefas de tempo real acessam as mesmas linhas da cache, o que dificulta a estimativa do pior tempo de execução. Esta pesquisa aborda os problemas supracitados com as estratĂ©gias de particionamento da cache e com os algoritmos de escalonamento tempo real multiprocessados da seguinte forma. Primeiro, uma infraestrutura de tempo real para multiprocessadores Ă© projetada e implementada em um sistema operacional embarcado. A infraestrutura consiste em diversos algoritmos de escalonamento tempo real, tais como o EDF global e particionado, e um mecanismo de particionamento da cache usando a tĂ©cnica de coloração de pĂĄginas. Segundo, Ă© apresentada uma comparação em termos da taxa de escalonabilidade considerando o sobre custo de tempo de execução da infraestrutura criada e de um sistema operacional de propĂłsito geral com extensĂ”es de tempo real. Em alguns casos, o EDF global considerando o sobre custo do sistema operacional embarcado possui uma melhor taxa de escalonabilidade do que o EDF particionado com o sobre custo do sistema operacional de propĂłsito geral, mostrando claramente como diferentes sistemas operacionais influenciam os escalonadores de tempo real crĂ­ticos em multiprocessadores. Terceiro, Ă© realizada uma avaliação do impacto do particionamento da memĂłria cache em diversos escalonadores de tempo real multiprocessados. Os resultados desta avaliação indicam que um sistema operacional "leve" nĂŁo compromete as garantias de tempo real e que o particionamento da cache tem diferentes comportamentos dependendo do escalonador e do tamanho do conjunto de trabalho das tarefas. Quarto, Ă© proposto um algoritmo de particionamento de tarefas que atribui as tarefas que compartilham partiçÔes ao mesmo processador. Os resultados mostram que essa tĂ©cnica de particionamento de tarefas reduz a disputa pelas linhas da cache compartilhadas e provĂȘ garantias de tempo real para sistemas crĂ­ticos. Finalmente, Ă© proposto um escalonador de tempo real de duas fases para multiprocessadores. O escalonador usa informaçÔes coletadas durante o tempo de execução das tarefas atravĂ©s dos contadores de desempenho em hardware. Com base nos valores dos contadores, o escalonador detecta quando tarefas de melhor esforço o interferem com tarefas de tempo real na cache. Assim Ă© possĂ­vel impedir que tarefas de melhor esforço acessem as mesmas linhas da cache que tarefas de tempo real. O resultado desta estratĂ©gia de escalonamento Ă© o atendimento dos prazos crĂ­ticos e nĂŁo crĂ­ticos das tarefas de tempo real.Abstracts: Modern multicore platforms feature multiple levels of cache memory placed between the processor and main memory to hide the latency of ordinary memory systems. The primary goal of this cache hierarchy is to improve average execution time (at the cost of predictability). The uncontrolled use of the cache hierarchy by realtime tasks may impact the estimation of their worst-case execution times (WCET), specially when real-time tasks access a shared cache level, causing a contention for shared cache lines and increasing the application execution time. This contention in the shared cache may leadto deadline losses, which is intolerable particularly for hard real-time (HRT) systems. Shared cache partitioning is a well-known technique used in multicore real-time systems to isolate task workloads and to improve system predictability. Presently, the state-of-the-art studies that evaluate shared cache partitioning on multicore processors lack two key issues. First, the cache partitioning mechanism is typically implemented either in a simulated environment or in a general-purpose OS (GPOS), and so the impact of kernel activities, such as interrupt handlers and context switching, on the task partitions tend to be overlooked. Second, the evaluation is typically restricted to either a global or partitioned scheduler, thereby by falling to compare the performance of cache partitioning when tasks are scheduled by different schedulers. Furthermore, recent works have confirmed that OS implementation aspects, such as the choice of scheduling data structures and interrupt handling mechanisms, impact real-time schedulability as much as scheduling theoretic aspects. However, these studies also used real-time patches applied into GPOSes, which affects the run-time overhead observed in these works and consequently the schedulability of real-time tasks. Additionally, current multicore scheduling algorithms do not consider scenarios where real-time tasks access the same cache lines due to true or false sharing, which also impacts the WCET. This thesis addresses these aforementioned problems with cache partitioning techniques and multicore real-time scheduling algorithms as following. First, a real-time multicore support is designed and implemented on top of an embedded operating system designed from scratch. This support consists of several multicore real-time scheduling algorithms, such as global and partitioned EDF, and a cache partitioning mechanism based on page coloring. Second, it is presented a comparison in terms of schedulability ratio considering the run-time overhead of the implemented RTOS and a GPOS patched with real-time extensions. In some cases, Global-EDF considering the overhead of the RTOS is superior to Partitioned-EDF considering the overhead of the patched GPOS, which clearly shows how different OSs impact hard realtime schedulers. Third, an evaluation of the cache partitioning impacton partitioned, clustered, and global real-time schedulers is performed.The results indicate that a lightweight RTOS does not impact real-time tasks, and shared cache partitioning has different behavior depending on the scheduler and the task's working set size. Fourth, a task partitioning algorithm that assigns tasks to cores respecting their usage of cache partitions is proposed. The results show that by simply assigning tasks that shared cache partitions to the same processor, it is possible to reduce the contention for shared cache lines and to provideHRT guarantees. Finally, a two-phase multicore scheduler that provides HRT and soft real-time (SRT) guarantees is proposed. It is shown that by using information from hardware performance counters at run-time, the RTOS can detect when best-effort tasks interfere with real-time tasks in the shared cache. Then, the RTOS can prevent best effort tasks from interfering with real-time tasks. The results also show that the assignment of exclusive partitions to HRT tasks together with the two-phase multicore scheduler provides HRT and SRT guarantees, even when best-effort tasks share partitions with real-time tasks

    Holistic resource allocation for multicore real-time systems

    Get PDF
    This paper presents CaM, a holistic cache and memory bandwidth resource allocation strategy for multicore real-time systems. CaM is designed for partitioned scheduling, where tasks are mapped onto cores, and the shared cache and memory bandwidth resources are partitioned among cores to reduce resource interferences due to concurrent accesses. Based on our extension of LITMUSRT with Intel’s Cache Allocation Technology and MemGuard, we present an experimental evaluation of the relationship between the allocation of cache and memory bandwidth resources and a task’s WCET. Our resource allocation strategy exploits this relationship to map tasks onto cores, and to compute the resource allocation for each core. By grouping tasks with similar characteristics (in terms of resource demands) to the same core, it enables tasks on each core to fully utilize the assigned resources. In addition, based on the tasks’ execution time behaviors with respect to their assigned resources, we can determine a desirable allocation that maximizes schedulability under resource constraints. Extensive evaluations using real-world benchmarks show that CaM offers near optimal schedulability performance while being highly efficient, and that it substantially outperforms existing solutions

    AMC: Advanced Multi-accelerator Controller

    Get PDF
    The rapid advancement, use of diverse architectural features and introduction of High Level Synthesis (HLS) tools in FPGA technology have enhanced the capacity of data-level parallelism on a chip. A generic FPGA based HLS multi-accelerator system requires a microprocessor (master core) that manages memory and schedules accelerators. In a real environment, such HLS multi-accelerator systems do not give a perfect performance due to memory bandwidth issues. Thus, a system demands a memory manager and a scheduler that improves performance by managing and scheduling the multi-accelerator’s memory access patterns efficiently. In this article, we propose the integration of an intelligent memory system and efficient scheduler in the HLS-based multi-accelerator environment called Advanced Multi-accelerator Controller (AMC). The AMC system is evaluated with memory intensive accelerators, High Performance Computing (HPC) applications and implemented and tested on a Xilinx Virtex-5 ML505 evaluation FPGA board. The performance of the system is compared against the microprocessor-based systems that have been integrated with the operating system. Results show that the AMC based HLS multi-accelerator system achieves 10.4x and 7x of speedup compared to the MicroBlaze and Intel Core based HLS multi-accelerator systems.Peer ReviewedPostprint (author’s final draft

    Exploring manycore architectures for next-generation HPC systems through the MANGO approach

    Full text link
    [EN] The Horizon 2020 MANGO project aims at exploring deeply heterogeneous accelerators for use in High-Performance Computing systems running multiple applications with different Quality of Service (QoS) levels. The main goal of the project is to exploit customization to adapt computing resources to reach the desired QoS. For this purpose, it explores different but interrelated mechanisms across the architecture and system software. In particular, in this paper we focus on the runtime resource management, the thermal management, and support provided for parallel programming, as well as introducing three applications on which the project foreground will be validated.This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 671668.Flich Cardo, J.; Agosta, G.; Ampletzer, P.; Atienza-Alonso, D.; Brandolese, C.; Cappe, E.; Cilardo, A.... (2018). Exploring manycore architectures for next-generation HPC systems through the MANGO approach. Microprocessors and Microsystems. 61:154-170. https://doi.org/10.1016/j.micpro.2018.05.011S1541706
    • 

    corecore