163 research outputs found

    Real-time operating system support for multicore applications

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2014Plataformas multiprocessadas atuais possuem diversos níveis da memória cache entre o processador e a memória principal para esconder a latência da hierarquia de memória. O principal objetivo da hierarquia de memória é melhorar o tempo médio de execução, ao custo da previsibilidade. O uso não controlado da hierarquia da cache pelas tarefas de tempo real impacta a estimativa dos seus piores tempos de execução, especialmente quando as tarefas de tempo real acessam os níveis da cache compartilhados. Tal acesso causa uma disputa pelas linhas da cache compartilhadas e aumenta o tempo de execução das aplicações. Além disso, essa disputa na cache compartilhada pode causar a perda de prazos, o que é intolerável em sistemas de tempo real críticos. O particionamento da memória cache compartilhada é uma técnica bastante utilizada em sistemas de tempo real multiprocessados para isolar as tarefas e melhorar a previsibilidade do sistema. Atualmente, os estudos que avaliam o particionamento da memória cache em multiprocessadores carecem de dois pontos fundamentais. Primeiro, o mecanismo de particionamento da cache é tipicamente implementado em um ambiente simulado ou em um sistema operacional de propósito geral. Consequentemente, o impacto das atividades realizados pelo núcleo do sistema operacional, tais como o tratamento de interrupções e troca de contexto, no particionamento das tarefas tende a ser negligenciado. Segundo, a avaliação é restrita a um escalonador global ou particionado, e assim não comparando o desempenho do particionamento da cache em diferentes estratégias de escalonamento. Ademais, trabalhos recentes confirmaram que aspectos da implementação do SO, tal como a estrutura de dados usada no escalonamento e os mecanismos de tratamento de interrupções, impactam a escalonabilidade das tarefas de tempo real tanto quanto os aspectos teóricos. Entretanto, tais estudos também usaram sistemas operacionais de propósito geral com extensões de tempo real, que afetamos sobre custos de tempo de execução observados e a escalonabilidade das tarefas de tempo real. Adicionalmente, os algoritmos de escalonamento tempo real para multiprocessadores atuais não consideram cenários onde tarefas de tempo real acessam as mesmas linhas da cache, o que dificulta a estimativa do pior tempo de execução. Esta pesquisa aborda os problemas supracitados com as estratégias de particionamento da cache e com os algoritmos de escalonamento tempo real multiprocessados da seguinte forma. Primeiro, uma infraestrutura de tempo real para multiprocessadores é projetada e implementada em um sistema operacional embarcado. A infraestrutura consiste em diversos algoritmos de escalonamento tempo real, tais como o EDF global e particionado, e um mecanismo de particionamento da cache usando a técnica de coloração de páginas. Segundo, é apresentada uma comparação em termos da taxa de escalonabilidade considerando o sobre custo de tempo de execução da infraestrutura criada e de um sistema operacional de propósito geral com extensões de tempo real. Em alguns casos, o EDF global considerando o sobre custo do sistema operacional embarcado possui uma melhor taxa de escalonabilidade do que o EDF particionado com o sobre custo do sistema operacional de propósito geral, mostrando claramente como diferentes sistemas operacionais influenciam os escalonadores de tempo real críticos em multiprocessadores. Terceiro, é realizada uma avaliação do impacto do particionamento da memória cache em diversos escalonadores de tempo real multiprocessados. Os resultados desta avaliação indicam que um sistema operacional "leve" não compromete as garantias de tempo real e que o particionamento da cache tem diferentes comportamentos dependendo do escalonador e do tamanho do conjunto de trabalho das tarefas. Quarto, é proposto um algoritmo de particionamento de tarefas que atribui as tarefas que compartilham partições ao mesmo processador. Os resultados mostram que essa técnica de particionamento de tarefas reduz a disputa pelas linhas da cache compartilhadas e provê garantias de tempo real para sistemas críticos. Finalmente, é proposto um escalonador de tempo real de duas fases para multiprocessadores. O escalonador usa informações coletadas durante o tempo de execução das tarefas através dos contadores de desempenho em hardware. Com base nos valores dos contadores, o escalonador detecta quando tarefas de melhor esforço o interferem com tarefas de tempo real na cache. Assim é possível impedir que tarefas de melhor esforço acessem as mesmas linhas da cache que tarefas de tempo real. O resultado desta estratégia de escalonamento é o atendimento dos prazos críticos e não críticos das tarefas de tempo real.Abstracts: Modern multicore platforms feature multiple levels of cache memory placed between the processor and main memory to hide the latency of ordinary memory systems. The primary goal of this cache hierarchy is to improve average execution time (at the cost of predictability). The uncontrolled use of the cache hierarchy by realtime tasks may impact the estimation of their worst-case execution times (WCET), specially when real-time tasks access a shared cache level, causing a contention for shared cache lines and increasing the application execution time. This contention in the shared cache may leadto deadline losses, which is intolerable particularly for hard real-time (HRT) systems. Shared cache partitioning is a well-known technique used in multicore real-time systems to isolate task workloads and to improve system predictability. Presently, the state-of-the-art studies that evaluate shared cache partitioning on multicore processors lack two key issues. First, the cache partitioning mechanism is typically implemented either in a simulated environment or in a general-purpose OS (GPOS), and so the impact of kernel activities, such as interrupt handlers and context switching, on the task partitions tend to be overlooked. Second, the evaluation is typically restricted to either a global or partitioned scheduler, thereby by falling to compare the performance of cache partitioning when tasks are scheduled by different schedulers. Furthermore, recent works have confirmed that OS implementation aspects, such as the choice of scheduling data structures and interrupt handling mechanisms, impact real-time schedulability as much as scheduling theoretic aspects. However, these studies also used real-time patches applied into GPOSes, which affects the run-time overhead observed in these works and consequently the schedulability of real-time tasks. Additionally, current multicore scheduling algorithms do not consider scenarios where real-time tasks access the same cache lines due to true or false sharing, which also impacts the WCET. This thesis addresses these aforementioned problems with cache partitioning techniques and multicore real-time scheduling algorithms as following. First, a real-time multicore support is designed and implemented on top of an embedded operating system designed from scratch. This support consists of several multicore real-time scheduling algorithms, such as global and partitioned EDF, and a cache partitioning mechanism based on page coloring. Second, it is presented a comparison in terms of schedulability ratio considering the run-time overhead of the implemented RTOS and a GPOS patched with real-time extensions. In some cases, Global-EDF considering the overhead of the RTOS is superior to Partitioned-EDF considering the overhead of the patched GPOS, which clearly shows how different OSs impact hard realtime schedulers. Third, an evaluation of the cache partitioning impacton partitioned, clustered, and global real-time schedulers is performed.The results indicate that a lightweight RTOS does not impact real-time tasks, and shared cache partitioning has different behavior depending on the scheduler and the task's working set size. Fourth, a task partitioning algorithm that assigns tasks to cores respecting their usage of cache partitions is proposed. The results show that by simply assigning tasks that shared cache partitions to the same processor, it is possible to reduce the contention for shared cache lines and to provideHRT guarantees. Finally, a two-phase multicore scheduler that provides HRT and soft real-time (SRT) guarantees is proposed. It is shown that by using information from hardware performance counters at run-time, the RTOS can detect when best-effort tasks interfere with real-time tasks in the shared cache. Then, the RTOS can prevent best effort tasks from interfering with real-time tasks. The results also show that the assignment of exclusive partitions to HRT tasks together with the two-phase multicore scheduler provides HRT and SRT guarantees, even when best-effort tasks share partitions with real-time tasks

    Scheduling and locking in multiprocessor real-time operating systems

    Get PDF
    With the widespread adoption of multicore architectures, multiprocessors are now a standard deployment platform for (soft) real-time applications. This dissertation addresses two questions fundamental to the design of multicore-ready real-time operating systems: (1) Which scheduling policies offer the greatest flexibility in satisfying temporal constraints; and (2) which locking algorithms should be used to avoid unpredictable delays? With regard to Question 1, LITMUSRT, a real-time extension of the Linux kernel, is presented and its design is discussed in detail. Notably, LITMUSRT implements link-based scheduling, a novel approach to controlling blocking due to non-preemptive sections. Each implemented scheduler (22 configurations in total) is evaluated under consideration of overheads on a 24-core Intel Xeon platform. The experiments show that partitioned earliest-deadline first (EDF) scheduling is generally preferable in a hard real-time setting, whereas global and clustered EDF scheduling are effective in a soft real-time setting. With regard to Question 2, real-time locking protocols are required to ensure that the maximum delay due to priority inversion can be bounded a priori. Several spinlock- and semaphore-based multiprocessor real-time locking protocols for mutual exclusion (mutex), reader-writer (RW) exclusion, and k-exclusion are proposed and analyzed. A new category of RW locks suited to worst-case analysis, termed phase-fair locks, is proposed and three efficient phase-fair spinlock implementations are provided (one with few atomic operations, one with low space requirements, and one with constant RMR complexity). Maximum priority-inversion blocking is proposed as a natural complexity measure for semaphore protocols. It is shown that there are two classes of schedulability analysis, namely suspension-oblivious and suspension-aware analysis, that yield two different lower bounds on blocking. Five asymptotically optimal locking protocols are designed and analyzed: a family of mutex, RW, and k-exclusion protocols for global, partitioned, and clustered scheduling that are asymptotically optimal in the suspension-oblivious case, and a mutex protocol for partitioned scheduling that is asymptotically optimal in the suspension-aware case. A LITMUSRT-based empirical evaluation is presented that shows these protocols to be practical

    Towards a distributed real-time system for future satellite applications

    Get PDF
    Thesis (MScEng)--University of Stellenbosch, 2003.ENGLISH ABSTRACT: The Linux operating system and shared Ethernet are alternative technologies with the potential to reduce both the development time and costs of satellites as well as the supporting infrastructure. Modular satellites, ground stations and rapid proto typing testbeds also have a common requirement for distributed real-time computation. The identified technologies were investigated to determine whether this requirement could also be met. Various real-time extensions and modifications are currently available for the Linux operating system. A suitable open source real-time extension called Real-Time Application Interface (RTAI) was selected for the implementation of an experimental distributed real-time system. Experimental results showed that the RTAI operating system could deliver deterministic realtime performance, but only in the absence of non-real-time load. Shared Ethernet is currently the most popular and widely used commercial networking technology. However, Ethernet wasn't developed to provide real-time performance. Several methods have been proposed in literature to modify Ethernet for real-time communications. A token passing protocol was found to be an effective and least intrusive solution. The Real-Time Token (RTToken) protocol was designed to guarantee predictable network access to communicating real-time tasks. The protocol passes a token between nodes in a predetermined order and nodes are assigned fixed token holding times. Experimental results proved that the protocol offered predictable network access with bounded jitter. An experimental distributed real-time system was implemented, which included the extension of the RTAI operating system with the RTToken protocol, as a loadable kernel module. Real-time tasks communicated using connectionless Internet protocols. The Real-Time networking (RTnet) subsystem of RTAI supported these protocols. Under collision-free conditions consistent transmission delays with bounded jitter was measured. The integrated RTToken protocol provided guaranteed and bounded network access to communicating real-time tasks, with limit overheads. Tests exhibited errors in some of the RTAI functionality. Overall the investigated technologies showed promise in being able to meet the distributed real-time requirements of various applications, including those found in the satellite environment.AFRIKAANSE OPSOMMING: Die Linux bedryfstelsel en gedeelde Ethernet is geïdentifiseer as potensiële tegnologieë vir satelliet bedryf wat besparings in koste en vinniger ontwikkeling te weeg kan bring. Modulêr ontwerpte satelliete, grondstasies en ontwikkeling platforms het 'n gemeenskaplike behoefte vir verspreide intydse verwerking. Verskillende tegnologieë is ondersoek om te bepaal of aan die vereiste ook voldoen kan word. Verskeie intydse uitbreidings en modifikasies is huidiglik beskikbaar vir die Linux bedryfstelsel. Die "Real-Time Application Interface" (RTAI) bedryfstelsel is geïdentifiseer as 'n geskikte intydse uitbreiding vir die implementering van 'n eksperimentele verspreide intydse stelsel. Eksperimentele resultate het getoon dat die RTAI bedryfstelsel deterministies en intyds kan opereer, maar dan moet dit geskied in die afwesigheid van 'n nie-intydse verwerkingslas. Gedeelde Ethernet is 'n kommersiële network tegnologie wat tans algemeen beskikbaar is. Die tegnologie is egter nie ontwerp vir intydse uitvoering nie. Verskeie metodes is in die literatuur voorgestelom Ethernet te modifiseer vir intydse kommunikasie. Hierdie ondersoek het getoon dat 'n teken-aangee protokol die mees effektiewe oplossing is en waarvan die implementering min inbreuk maak. Die "Real-Time Token" (RTToken) protokol is ontwerp om voorspelbare netwerk toegang tot kommunikerende intydse take te verseker. Die protokol stuur 'n teken tussen nodusse in 'n voorafbepaalde volgorde. Nodusse word ook vaste teken hou-tye geallokeer. Eksperimentele resultate het aangedui dat die protokol deterministiese netwerk toegang kan verseker met begrensde variasies. 'n Eksperimentele verspreide intydse stelsel is geïmplementeer. Dit het ingesluit die uitbreiding van die RTAI bedryfstelsel met die RTToken protokol; verpak as 'n laaibare bedryfstelsel module. Intydse take kan kommunikeer met verbindinglose protokolle wat deur die "Real-Time networking" (RTnet) substelsel van RTAI ondersteun word. Onder ideale toestande is konstante transmissie vertragings met begrensde variasies gemeet. Die integrasie van die RTToken protokol het botsinglose netwerk toegang aan kommunikerende take verseker, met beperkte oorhoofse koste as teenprestasie. Eksperimente het enkele foute in die funksionaliteit van RTAI uitgewys. In die algemeen het die voorgestelde tegnologieë getoon dat dit potensiaal het vir verskeie verspreide intydse toepassings in toekomstige satelliet en ook ander omgewings

    Análise de malware com suporte de hardware

    Get PDF
    Orientadores: Paulo Lício de Geus, André Ricardo Abed GrégioDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O mundo atual é impulsionado pelo uso de sistemas computacionais, estando estes pre- sentes em todos aspectos da vida cotidiana. Portanto, o correto funcionamento destes é essencial para se assegurar a manutenção das possibilidades trazidas pelos desenvolvi- mentos tecnológicos. Contudo, garantir o correto funcionamento destes não é uma tarefa fácil, dado que indivíduos mal-intencionados tentam constantemente subvertê-los visando benefíciar a si próprios ou a terceiros. Os tipos mais comuns de subversão são os ataques por códigos maliciosos (malware), capazes de dar a um atacante controle total sobre uma máquina. O combate à ameaça trazida por malware baseia-se na análise dos artefatos coletados de forma a permitir resposta aos incidentes ocorridos e o desenvolvimento de contramedidas futuras. No entanto, atacantes têm se especializado em burlar sistemas de análise e assim manter suas operações ativas. Para este propósito, faz-se uso de uma série de técnicas denominadas de "anti-análise", capazes de impedir a inspeção direta dos códigos maliciosos. Dentre essas técnicas, destaca-se a evasão do processo de análise, na qual são empregadas exemplares capazes de detectar a presença de um sistema de análise para então esconder seu comportamento malicioso. Exemplares evasivos têm sido cada vez mais utilizados em ataques e seu impacto sobre a segurança de sistemas é considerá- vel, dado que análises antes feitas de forma automática passaram a exigir a supervisão de analistas humanos em busca de sinais de evasão, aumentando assim o custo de se manter um sistema protegido. As formas mais comuns de detecção de um ambiente de análise se dão através da detecção de: (i) código injetado, usado pelo analista para inspecionar a aplicação; (ii) máquinas virtuais, usadas em ambientes de análise por questões de escala; (iii) efeitos colaterais de execução, geralmente causados por emuladores, também usados por analistas. Para lidar com malware evasivo, analistas tem se valido de técnicas ditas transparentes, isto é, que não requerem injeção de código nem causam efeitos colaterais de execução. Um modo de se obter transparência em um processo de análise é contar com suporte do hardware. Desta forma, este trabalho versa sobre a aplicação do suporte de hardware para fins de análise de ameaças evasivas. No decorrer deste texto, apresenta-se uma avaliação das tecnologias existentes de suporte de hardware, dentre as quais máqui- nas virtuais de hardware, suporte de BIOS e monitores de performance. A avaliação crítica de tais tecnologias oferece uma base de comparação entre diferentes casos de uso. Além disso, são enumeradas lacunas de desenvolvimento existentes atualmente. Mais que isso, uma destas lacunas é preenchida neste trabalho pela proposição da expansão do uso dos monitores de performance para fins de monitoração de malware. Mais especificamente, é proposto o uso do monitor BTS para fins de construção de um tracer e um debugger. O framework proposto e desenvolvido neste trabalho é capaz, ainda, de lidar com ataques do tipo ROP, um dos mais utilizados atualmente para exploração de vulnerabilidades. A avaliação da solução demonstra que não há a introdução de efeitos colaterais, o que per- mite análises de forma transparente. Beneficiando-se desta característica, demonstramos a análise de aplicações protegidas e a identificação de técnicas de evasãoAbstract: Today¿s world is driven by the usage of computer systems, which are present in all aspects of everyday life. Therefore, the correct working of these systems is essential to ensure the maintenance of the possibilities brought about by technological developments. However, ensuring the correct working of such systems is not an easy task, as many people attempt to subvert systems working for their own benefit. The most common kind of subversion against computer systems are malware attacks, which can make an attacker to gain com- plete machine control. The fight against this kind of threat is based on analysis procedures of the collected malicious artifacts, allowing the incident response and the development of future countermeasures. However, attackers have specialized in circumventing analysis systems and thus keeping their operations active. For this purpose, they employ a series of techniques called anti-analysis, able to prevent the inspection of their malicious codes. Among these techniques, I highlight the analysis procedure evasion, that is, the usage of samples able to detect the presence of an analysis solution and then hide their malicious behavior. Evasive examples have become popular, and their impact on systems security is considerable, since automatic analysis now requires human supervision in order to find evasion signs, which significantly raises the cost of maintaining a protected system. The most common ways for detecting an analysis environment are: i) Injected code detec- tion, since injection is used by analysts to inspect applications on their way; ii) Virtual machine detection, since they are used in analysis environments due to scalability issues; iii) Execution side effects detection, usually caused by emulators, also used by analysts. To handle evasive malware, analysts have relied on the so-called transparent techniques, that is, those which do not require code injection nor cause execution side effects. A way to achieve transparency in an analysis process is to rely on hardware support. In this way, this work covers the application of the hardware support for the evasive threats analysis purpose. In the course of this text, I present an assessment of existing hardware support technologies, including hardware virtual machines, BIOS support, performance monitors and PCI cards. My critical evaluation of such technologies provides basis for comparing different usage cases. In addition, I pinpoint development gaps that currently exists. More than that, I fill one of these gaps by proposing to expand the usage of performance monitors for malware monitoring purposes. More specifically, I propose the usage of the BTS monitor for the purpose of developing a tracer and a debugger. The proposed framework is also able of dealing with ROP attacks, one of the most common used technique for remote vulnerability exploitation. The framework evaluation shows no side-effect is introduced, thus allowing transparent analysis. Making use of this capability, I demonstrate how protected applications can be inspected and how evasion techniques can be identifiedMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    Dynamic Voltage Scaling for Energy- Constrained Real-Time Systems

    Get PDF
    The problem of reducing energy consumption is dominating the design of several real-time systems. The Dynamic Voltage Scaling (DVS) technique, provided by most microprocessors, allow to balance computational speed versus energy consumption. We present some novel energy-aware scheduling algorithms that allow to expoit this technique while meeting real-time constraints. In particular, we present the GRUB-PA algorithm which, unlike most existing algorithms, allows to reduce energy consumption on real-time systems consisting of any kind of task. We also present a working implementation of the algorithm on Linux

    Accelerating Transactional Memory by Exploiting Platform Specificity

    Get PDF
    Transactional Memory (TM) is one of the most promising alternatives to lock-based concurrency, but there still remain obstacles that keep TM from being utilized in the real world. Performance, in terms of high scalability and low latency, is always one of the most important keys to general purpose usage. While most of the research in this area focuses on improving a specific single TM implementation and some default platform (a certain operating system, compiler and/or processor), little has been conducted on improving performance more generally, and across platforms.We found that by utilizing platform specificity, we could gain tremendous performance improvement and avoid unnecessary costs due to false assumptions of platform properties, on not only a single TM implementation, but many. In this dissertation, we will present our findings in four sections: 1) we discover and quantify hidden costs from inappropriate compiler instrumentations, and provide sug- gestions and solutions; 2) we boost a set of mainstream timestamp-based TM implementations with the x86-specific hardware cycle counter; 3) we explore compiler opportunities to reduce the transaction abort rate, by reordering read-modify-write operations — the whole technique can be applied to all TM implementations, and could be more effective with some help from compilers; and 4) we coordinate the state-of-the-art Intel Haswell TSX hardware TM with a software TM “Cohorts”, and develop a safe and flexible Hybrid TM, “HyCo”, to be our final performance boost in this dissertation.The impact of our research extends beyond Transactional Memory, to broad areas of concurrent programming. Some of our solutions and discussions, such as the synchronization between accesses of the hardware cycle counter and memory loads and stores, can be utilized to boost concurrent data structures and many timestamp-based systems and applications. Others, such as discussions of compiler instrumentation costs and reordering opportunities, provide additional insights to compiler designers. Our findings show that platform specificity must be taken into consideration to achieve peak performance
    • …
    corecore