163 research outputs found
Real-time operating system support for multicore applications
Tese (doutorado) - Universidade Federal de Santa Catarina, Centro TecnolĂłgico, Programa de PĂłs-Graduação em Engenharia de Automação e Sistemas, FlorianĂłpolis, 2014Plataformas multiprocessadas atuais possuem diversos nĂveis da memĂłria cache entre o processador e a memĂłria principal para esconder a latĂŞncia da hierarquia de memĂłria. O principal objetivo da hierarquia de memĂłria Ă© melhorar o tempo mĂ©dio de execução, ao custo da previsibilidade. O uso nĂŁo controlado da hierarquia da cache pelas tarefas de tempo real impacta a estimativa dos seus piores tempos de execução, especialmente quando as tarefas de tempo real acessam os nĂveis da cache compartilhados. Tal acesso causa uma disputa pelas linhas da cache compartilhadas e aumenta o tempo de execução das aplicações. AlĂ©m disso, essa disputa na cache compartilhada pode causar a perda de prazos, o que Ă© intolerável em sistemas de tempo real crĂticos. O particionamento da memĂłria cache compartilhada Ă© uma tĂ©cnica bastante utilizada em sistemas de tempo real multiprocessados para isolar as tarefas e melhorar a previsibilidade do sistema. Atualmente, os estudos que avaliam o particionamento da memĂłria cache em multiprocessadores carecem de dois pontos fundamentais. Primeiro, o mecanismo de particionamento da cache Ă© tipicamente implementado em um ambiente simulado ou em um sistema operacional de propĂłsito geral. Consequentemente, o impacto das atividades realizados pelo nĂşcleo do sistema operacional, tais como o tratamento de interrupções e troca de contexto, no particionamento das tarefas tende a ser negligenciado. Segundo, a avaliação Ă© restrita a um escalonador global ou particionado, e assim nĂŁo comparando o desempenho do particionamento da cache em diferentes estratĂ©gias de escalonamento. Ademais, trabalhos recentes confirmaram que aspectos da implementação do SO, tal como a estrutura de dados usada no escalonamento e os mecanismos de tratamento de interrupções, impactam a escalonabilidade das tarefas de tempo real tanto quanto os aspectos teĂłricos. Entretanto, tais estudos tambĂ©m usaram sistemas operacionais de propĂłsito geral com extensões de tempo real, que afetamos sobre custos de tempo de execução observados e a escalonabilidade das tarefas de tempo real. Adicionalmente, os algoritmos de escalonamento tempo real para multiprocessadores atuais nĂŁo consideram cenários onde tarefas de tempo real acessam as mesmas linhas da cache, o que dificulta a estimativa do pior tempo de execução. Esta pesquisa aborda os problemas supracitados com as estratĂ©gias de particionamento da cache e com os algoritmos de escalonamento tempo real multiprocessados da seguinte forma. Primeiro, uma infraestrutura de tempo real para multiprocessadores Ă© projetada e implementada em um sistema operacional embarcado. A infraestrutura consiste em diversos algoritmos de escalonamento tempo real, tais como o EDF global e particionado, e um mecanismo de particionamento da cache usando a tĂ©cnica de coloração de páginas. Segundo, Ă© apresentada uma comparação em termos da taxa de escalonabilidade considerando o sobre custo de tempo de execução da infraestrutura criada e de um sistema operacional de propĂłsito geral com extensões de tempo real. Em alguns casos, o EDF global considerando o sobre custo do sistema operacional embarcado possui uma melhor taxa de escalonabilidade do que o EDF particionado com o sobre custo do sistema operacional de propĂłsito geral, mostrando claramente como diferentes sistemas operacionais influenciam os escalonadores de tempo real crĂticos em multiprocessadores. Terceiro, Ă© realizada uma avaliação do impacto do particionamento da memĂłria cache em diversos escalonadores de tempo real multiprocessados. Os resultados desta avaliação indicam que um sistema operacional "leve" nĂŁo compromete as garantias de tempo real e que o particionamento da cache tem diferentes comportamentos dependendo do escalonador e do tamanho do conjunto de trabalho das tarefas. Quarto, Ă© proposto um algoritmo de particionamento de tarefas que atribui as tarefas que compartilham partições ao mesmo processador. Os resultados mostram que essa tĂ©cnica de particionamento de tarefas reduz a disputa pelas linhas da cache compartilhadas e provĂŞ garantias de tempo real para sistemas crĂticos. Finalmente, Ă© proposto um escalonador de tempo real de duas fases para multiprocessadores. O escalonador usa informações coletadas durante o tempo de execução das tarefas atravĂ©s dos contadores de desempenho em hardware. Com base nos valores dos contadores, o escalonador detecta quando tarefas de melhor esforço o interferem com tarefas de tempo real na cache. Assim Ă© possĂvel impedir que tarefas de melhor esforço acessem as mesmas linhas da cache que tarefas de tempo real. O resultado desta estratĂ©gia de escalonamento Ă© o atendimento dos prazos crĂticos e nĂŁo crĂticos das tarefas de tempo real.Abstracts: Modern multicore platforms feature multiple levels of cache memory placed between the processor and main memory to hide the latency of ordinary memory systems. The primary goal of this cache hierarchy is to improve average execution time (at the cost of predictability). The uncontrolled use of the cache hierarchy by realtime tasks may impact the estimation of their worst-case execution times (WCET), specially when real-time tasks access a shared cache level, causing a contention for shared cache lines and increasing the application execution time. This contention in the shared cache may leadto deadline losses, which is intolerable particularly for hard real-time (HRT) systems. Shared cache partitioning is a well-known technique used in multicore real-time systems to isolate task workloads and to improve system predictability. Presently, the state-of-the-art studies that evaluate shared cache partitioning on multicore processors lack two key issues. First, the cache partitioning mechanism is typically implemented either in a simulated environment or in a general-purpose OS (GPOS), and so the impact of kernel activities, such as interrupt handlers and context switching, on the task partitions tend to be overlooked. Second, the evaluation is typically restricted to either a global or partitioned scheduler, thereby by falling to compare the performance of cache partitioning when tasks are scheduled by different schedulers. Furthermore, recent works have confirmed that OS implementation aspects, such as the choice of scheduling data structures and interrupt handling mechanisms, impact real-time schedulability as much as scheduling theoretic aspects. However, these studies also used real-time patches applied into GPOSes, which affects the run-time overhead observed in these works and consequently the schedulability of real-time tasks. Additionally, current multicore scheduling algorithms do not consider scenarios where real-time tasks access the same cache lines due to true or false sharing, which also impacts the WCET. This thesis addresses these aforementioned problems with cache partitioning techniques and multicore real-time scheduling algorithms as following. First, a real-time multicore support is designed and implemented on top of an embedded operating system designed from scratch. This support consists of several multicore real-time scheduling algorithms, such as global and partitioned EDF, and a cache partitioning mechanism based on page coloring. Second, it is presented a comparison in terms of schedulability ratio considering the run-time overhead of the implemented RTOS and a GPOS patched with real-time extensions. In some cases, Global-EDF considering the overhead of the RTOS is superior to Partitioned-EDF considering the overhead of the patched GPOS, which clearly shows how different OSs impact hard realtime schedulers. Third, an evaluation of the cache partitioning impacton partitioned, clustered, and global real-time schedulers is performed.The results indicate that a lightweight RTOS does not impact real-time tasks, and shared cache partitioning has different behavior depending on the scheduler and the task's working set size. Fourth, a task partitioning algorithm that assigns tasks to cores respecting their usage of cache partitions is proposed. The results show that by simply assigning tasks that shared cache partitions to the same processor, it is possible to reduce the contention for shared cache lines and to provideHRT guarantees. Finally, a two-phase multicore scheduler that provides HRT and soft real-time (SRT) guarantees is proposed. It is shown that by using information from hardware performance counters at run-time, the RTOS can detect when best-effort tasks interfere with real-time tasks in the shared cache. Then, the RTOS can prevent best effort tasks from interfering with real-time tasks. The results also show that the assignment of exclusive partitions to HRT tasks together with the two-phase multicore scheduler provides HRT and SRT guarantees, even when best-effort tasks share partitions with real-time tasks
Scheduling and locking in multiprocessor real-time operating systems
With the widespread adoption of multicore architectures, multiprocessors are now a standard deployment platform for (soft) real-time applications. This dissertation addresses two questions fundamental to the design of multicore-ready real-time operating systems: (1) Which scheduling policies offer the greatest flexibility in satisfying temporal constraints; and (2) which locking algorithms should be used to avoid unpredictable delays? With regard to Question 1, LITMUSRT, a real-time extension of the Linux kernel, is presented and its design is discussed in detail. Notably, LITMUSRT implements link-based scheduling, a novel approach to controlling blocking due to non-preemptive sections. Each implemented scheduler (22 configurations in total) is evaluated under consideration of overheads on a 24-core Intel Xeon platform. The experiments show that partitioned earliest-deadline first (EDF) scheduling is generally preferable in a hard real-time setting, whereas global and clustered EDF scheduling are effective in a soft real-time setting. With regard to Question 2, real-time locking protocols are required to ensure that the maximum delay due to priority inversion can be bounded a priori. Several spinlock- and semaphore-based multiprocessor real-time locking protocols for mutual exclusion (mutex), reader-writer (RW) exclusion, and k-exclusion are proposed and analyzed. A new category of RW locks suited to worst-case analysis, termed phase-fair locks, is proposed and three efficient phase-fair spinlock implementations are provided (one with few atomic operations, one with low space requirements, and one with constant RMR complexity). Maximum priority-inversion blocking is proposed as a natural complexity measure for semaphore protocols. It is shown that there are two classes of schedulability analysis, namely suspension-oblivious and suspension-aware analysis, that yield two different lower bounds on blocking. Five asymptotically optimal locking protocols are designed and analyzed: a family of mutex, RW, and k-exclusion protocols for global, partitioned, and clustered scheduling that are asymptotically optimal in the suspension-oblivious case, and a mutex protocol for partitioned scheduling that is asymptotically optimal in the suspension-aware case. A LITMUSRT-based empirical evaluation is presented that shows these protocols to be practical
Towards a distributed real-time system for future satellite applications
Thesis (MScEng)--University of Stellenbosch, 2003.ENGLISH ABSTRACT: The Linux operating system and shared Ethernet are alternative technologies with the potential to
reduce both the development time and costs of satellites as well as the supporting infrastructure.
Modular satellites, ground stations and rapid proto typing testbeds also have a common
requirement for distributed real-time computation. The identified technologies were investigated
to determine whether this requirement could also be met.
Various real-time extensions and modifications are currently available for the Linux operating
system. A suitable open source real-time extension called Real-Time Application Interface
(RTAI) was selected for the implementation of an experimental distributed real-time system.
Experimental results showed that the RTAI operating system could deliver deterministic realtime
performance, but only in the absence of non-real-time load.
Shared Ethernet is currently the most popular and widely used commercial networking
technology. However, Ethernet wasn't developed to provide real-time performance. Several
methods have been proposed in literature to modify Ethernet for real-time communications. A
token passing protocol was found to be an effective and least intrusive solution. The Real-Time
Token (RTToken) protocol was designed to guarantee predictable network access to
communicating real-time tasks. The protocol passes a token between nodes in a predetermined
order and nodes are assigned fixed token holding times. Experimental results proved that the
protocol offered predictable network access with bounded jitter.
An experimental distributed real-time system was implemented, which included the extension of
the RTAI operating system with the RTToken protocol, as a loadable kernel module. Real-time
tasks communicated using connectionless Internet protocols. The Real-Time networking (RTnet)
subsystem of RTAI supported these protocols. Under collision-free conditions consistent
transmission delays with bounded jitter was measured. The integrated RTToken protocol
provided guaranteed and bounded network access to communicating real-time tasks, with limit
overheads. Tests exhibited errors in some of the RTAI functionality. Overall the investigated
technologies showed promise in being able to meet the distributed real-time requirements of
various applications, including those found in the satellite environment.AFRIKAANSE OPSOMMING: Die Linux bedryfstelsel en gedeelde Ethernet is geïdentifiseer as potensiële tegnologieë vir
satelliet bedryf wat besparings in koste en vinniger ontwikkeling te weeg kan bring. ModulĂŞr
ontwerpte satelliete, grondstasies en ontwikkeling platforms het 'n gemeenskaplike behoefte vir
verspreide intydse verwerking. Verskillende tegnologieë is ondersoek om te bepaal of aan die
vereiste ook voldoen kan word.
Verskeie intydse uitbreidings en modifikasies is huidiglik beskikbaar vir die Linux bedryfstelsel.
Die "Real-Time Application Interface" (RTAI) bedryfstelsel is geĂŻdentifiseer as 'n geskikte
intydse uitbreiding vir die implementering van 'n eksperimentele verspreide intydse stelsel.
Eksperimentele resultate het getoon dat die RTAI bedryfstelsel deterministies en intyds kan
opereer, maar dan moet dit geskied in die afwesigheid van 'n nie-intydse verwerkingslas.
Gedeelde Ethernet is 'n kommersiële network tegnologie wat tans algemeen beskikbaar is. Die
tegnologie is egter nie ontwerp vir intydse uitvoering nie. Verskeie metodes is in die literatuur
voorgestelom Ethernet te modifiseer vir intydse kommunikasie. Hierdie ondersoek het getoon
dat 'n teken-aangee protokol die mees effektiewe oplossing is en waarvan die implementering
min inbreuk maak. Die "Real-Time Token" (RTToken) protokol is ontwerp om voorspelbare
netwerk toegang tot kommunikerende intydse take te verseker. Die protokol stuur 'n teken tussen
nodusse in 'n voorafbepaalde volgorde. Nodusse word ook vaste teken hou-tye geallokeer.
Eksperimentele resultate het aangedui dat die protokol deterministiese netwerk toegang kan
verseker met begrensde variasies.
'n Eksperimentele verspreide intydse stelsel is geĂŻmplementeer. Dit het ingesluit die uitbreiding
van die RTAI bedryfstelsel met die RTToken protokol; verpak as 'n laaibare bedryfstelsel
module. Intydse take kan kommunikeer met verbindinglose protokolle wat deur die "Real-Time
networking" (RTnet) substelsel van RTAI ondersteun word. Onder ideale toestande is konstante
transmissie vertragings met begrensde variasies gemeet. Die integrasie van die RTToken
protokol het botsinglose netwerk toegang aan kommunikerende take verseker, met beperkte
oorhoofse koste as teenprestasie. Eksperimente het enkele foute in die funksionaliteit van RTAI
uitgewys. In die algemeen het die voorgestelde tegnologieë getoon dat dit potensiaal het vir
verskeie verspreide intydse toepassings in toekomstige satelliet en ook ander omgewings
Análise de malware com suporte de hardware
Orientadores: Paulo LĂcio de Geus, AndrĂ© Ricardo Abed GrĂ©gioDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O mundo atual Ă© impulsionado pelo uso de sistemas computacionais, estando estes pre- sentes em todos aspectos da vida cotidiana. Portanto, o correto funcionamento destes Ă© essencial para se assegurar a manutenção das possibilidades trazidas pelos desenvolvi- mentos tecnolĂłgicos. Contudo, garantir o correto funcionamento destes nĂŁo Ă© uma tarefa fácil, dado que indivĂduos mal-intencionados tentam constantemente subvertĂŞ-los visando benefĂciar a si prĂłprios ou a terceiros. Os tipos mais comuns de subversĂŁo sĂŁo os ataques por cĂłdigos maliciosos (malware), capazes de dar a um atacante controle total sobre uma máquina. O combate Ă ameaça trazida por malware baseia-se na análise dos artefatos coletados de forma a permitir resposta aos incidentes ocorridos e o desenvolvimento de contramedidas futuras. No entanto, atacantes tĂŞm se especializado em burlar sistemas de análise e assim manter suas operações ativas. Para este propĂłsito, faz-se uso de uma sĂ©rie de tĂ©cnicas denominadas de "anti-análise", capazes de impedir a inspeção direta dos cĂłdigos maliciosos. Dentre essas tĂ©cnicas, destaca-se a evasĂŁo do processo de análise, na qual sĂŁo empregadas exemplares capazes de detectar a presença de um sistema de análise para entĂŁo esconder seu comportamento malicioso. Exemplares evasivos tĂŞm sido cada vez mais utilizados em ataques e seu impacto sobre a segurança de sistemas Ă© considerá- vel, dado que análises antes feitas de forma automática passaram a exigir a supervisĂŁo de analistas humanos em busca de sinais de evasĂŁo, aumentando assim o custo de se manter um sistema protegido. As formas mais comuns de detecção de um ambiente de análise se dĂŁo atravĂ©s da detecção de: (i) cĂłdigo injetado, usado pelo analista para inspecionar a aplicação; (ii) máquinas virtuais, usadas em ambientes de análise por questões de escala; (iii) efeitos colaterais de execução, geralmente causados por emuladores, tambĂ©m usados por analistas. Para lidar com malware evasivo, analistas tem se valido de tĂ©cnicas ditas transparentes, isto Ă©, que nĂŁo requerem injeção de cĂłdigo nem causam efeitos colaterais de execução. Um modo de se obter transparĂŞncia em um processo de análise Ă© contar com suporte do hardware. Desta forma, este trabalho versa sobre a aplicação do suporte de hardware para fins de análise de ameaças evasivas. No decorrer deste texto, apresenta-se uma avaliação das tecnologias existentes de suporte de hardware, dentre as quais máqui- nas virtuais de hardware, suporte de BIOS e monitores de performance. A avaliação crĂtica de tais tecnologias oferece uma base de comparação entre diferentes casos de uso. AlĂ©m disso, sĂŁo enumeradas lacunas de desenvolvimento existentes atualmente. Mais que isso, uma destas lacunas Ă© preenchida neste trabalho pela proposição da expansĂŁo do uso dos monitores de performance para fins de monitoração de malware. Mais especificamente, Ă© proposto o uso do monitor BTS para fins de construção de um tracer e um debugger. O framework proposto e desenvolvido neste trabalho Ă© capaz, ainda, de lidar com ataques do tipo ROP, um dos mais utilizados atualmente para exploração de vulnerabilidades. A avaliação da solução demonstra que nĂŁo há a introdução de efeitos colaterais, o que per- mite análises de forma transparente. Beneficiando-se desta caracterĂstica, demonstramos a análise de aplicações protegidas e a identificação de tĂ©cnicas de evasĂŁoAbstract: TodayÂżs world is driven by the usage of computer systems, which are present in all aspects of everyday life. Therefore, the correct working of these systems is essential to ensure the maintenance of the possibilities brought about by technological developments. However, ensuring the correct working of such systems is not an easy task, as many people attempt to subvert systems working for their own benefit. The most common kind of subversion against computer systems are malware attacks, which can make an attacker to gain com- plete machine control. The fight against this kind of threat is based on analysis procedures of the collected malicious artifacts, allowing the incident response and the development of future countermeasures. However, attackers have specialized in circumventing analysis systems and thus keeping their operations active. For this purpose, they employ a series of techniques called anti-analysis, able to prevent the inspection of their malicious codes. Among these techniques, I highlight the analysis procedure evasion, that is, the usage of samples able to detect the presence of an analysis solution and then hide their malicious behavior. Evasive examples have become popular, and their impact on systems security is considerable, since automatic analysis now requires human supervision in order to find evasion signs, which significantly raises the cost of maintaining a protected system. The most common ways for detecting an analysis environment are: i) Injected code detec- tion, since injection is used by analysts to inspect applications on their way; ii) Virtual machine detection, since they are used in analysis environments due to scalability issues; iii) Execution side effects detection, usually caused by emulators, also used by analysts. To handle evasive malware, analysts have relied on the so-called transparent techniques, that is, those which do not require code injection nor cause execution side effects. A way to achieve transparency in an analysis process is to rely on hardware support. In this way, this work covers the application of the hardware support for the evasive threats analysis purpose. In the course of this text, I present an assessment of existing hardware support technologies, including hardware virtual machines, BIOS support, performance monitors and PCI cards. My critical evaluation of such technologies provides basis for comparing different usage cases. In addition, I pinpoint development gaps that currently exists. More than that, I fill one of these gaps by proposing to expand the usage of performance monitors for malware monitoring purposes. More specifically, I propose the usage of the BTS monitor for the purpose of developing a tracer and a debugger. The proposed framework is also able of dealing with ROP attacks, one of the most common used technique for remote vulnerability exploitation. The framework evaluation shows no side-effect is introduced, thus allowing transparent analysis. Making use of this capability, I demonstrate how protected applications can be inspected and how evasion techniques can be identifiedMestradoCiĂŞncia da ComputaçãoMestre em CiĂŞncia da ComputaçãoCAPE
Dynamic Voltage Scaling for Energy- Constrained Real-Time Systems
The problem of reducing energy consumption is dominating the design of several real-time systems.
The Dynamic Voltage Scaling (DVS) technique, provided by most microprocessors, allow to balance
computational speed versus energy consumption.
We present some novel energy-aware scheduling algorithms that allow to expoit this technique while
meeting real-time constraints. In particular, we present the GRUB-PA algorithm which, unlike most
existing algorithms, allows to reduce energy consumption on real-time systems consisting of any kind of task.
We also present a working implementation of the algorithm on Linux
Accelerating Transactional Memory by Exploiting Platform Specificity
Transactional Memory (TM) is one of the most promising alternatives to lock-based concurrency, but there still remain obstacles that keep TM from being utilized in the real world. Performance, in terms of high scalability and low latency, is always one of the most important keys to general purpose usage. While most of the research in this area focuses on improving a specific single TM implementation and some default platform (a certain operating system, compiler and/or processor), little has been conducted on improving performance more generally, and across platforms.We found that by utilizing platform specificity, we could gain tremendous performance improvement and avoid unnecessary costs due to false assumptions of platform properties, on not only a single TM implementation, but many. In this dissertation, we will present our findings in four sections: 1) we discover and quantify hidden costs from inappropriate compiler instrumentations, and provide sug- gestions and solutions; 2) we boost a set of mainstream timestamp-based TM implementations with the x86-specific hardware cycle counter; 3) we explore compiler opportunities to reduce the transaction abort rate, by reordering read-modify-write operations — the whole technique can be applied to all TM implementations, and could be more effective with some help from compilers; and 4) we coordinate the state-of-the-art Intel Haswell TSX hardware TM with a software TM “Cohorts”, and develop a safe and flexible Hybrid TM, “HyCo”, to be our final performance boost in this dissertation.The impact of our research extends beyond Transactional Memory, to broad areas of concurrent programming. Some of our solutions and discussions, such as the synchronization between accesses of the hardware cycle counter and memory loads and stores, can be utilized to boost concurrent data structures and many timestamp-based systems and applications. Others, such as discussions of compiler instrumentation costs and reordering opportunities, provide additional insights to compiler designers. Our findings show that platform specificity must be taken into consideration to achieve peak performance
- …