39 research outputs found
A Probabilistically Analyzable Cache to Estimate Timing Bounds
RÉSUMÉ - Les architectures informatiques modernes cherchent à accélérer la performance moyenne
des logiciels en cours d’exécution. Les caractéristiques architecturales comme : deep pipelines,
prédiction de branchement, exécution hors ordre, et hiérarchie des mémoire à multiple
niveaux ont un impact négatif sur le logiciel de prédiction temporelle. En particulier, il est
difficile, voire impossible, de faire une estimation précise du pire cas de temps d’exécution
(WCET) d’un programme ou d’un logiciel en cours d’exécution sur une plateforme informatique
particulière. Les systèmes embarqués critiques temps réel (CRTESs), par exemple
les systèmes informatiques dans le domaine aérospatiale, exigent des contraintes de temps
strictes pour garantir leur fonctionnement opérationnel. L’analyse du WCET est l’idée centrale
du développement des systèmes temps réel puisque les systèmes temps réel ont toujours
besoin de respecter leurs échéances. Afin de répondre aux exigences du délai, le WCET des
tâches des systèmes temps réel doivent être déterminées, et cela est seulement possible si
l’architecture informatique est temporellement prévisible. En raison de la nature imprévisible
des systems informatiques modernes, il est peu pratique d’utiliser des systèmes informatiques
avancés dans les CRTESs. En temps réel, les systèmes ne doivent pas répondre aux exigences
de haute performance. Les processeurs conçus pour améliorer la performance des systèmes
informatiques en général peuvent ne pas être compatibles avec les exigences pour les systèmes
temps réel en raison de problèmes de prédictabilité. Les techniques d’analyse temporelle actuelles
sont bien établies, mais nécessitent une connaissance détaillée des opérations internes
et de l’état du système pour le matériel et le logiciel. Le manque de connaissances approfondies
des opérations architecturales devient un obstacle à l’adoption de techniques déterministes
de l’analyse temporelle (DTA) pour mesurer le WCET. Les techniques probabilistes de l’analyse
temporelle (PTA) ont, quant à elles, émergé comme les techniques d’analyse temporelle
pour la prochaine génération de systèmes temps réel. Les techniques PTA réduisent l’étendue
des connaissances nécessaires pour l’exécution d’un logiciel informatique afin d’effectuer
des estimations précises du WCET. Dans cette thèse, nous proposons le développement d’une
nouvelle technique pour un cache probabilistiquement analysable, tout en appliquant les techniques
PTA pour prédire le temps d’exécution d’un logiciel. Dans ce travail, nous avons mis
en place une cache aléatoire pour les processeurs MIPS-32 et Leon-3. Nous avons conçu et mis
en œuvre les politiques de placement et remplacement aléatoire et appliquer des techniques
temporelles probabilistiques pour mesurer le WCET probabiliste (pWCET). Nous avons également
mesuré le niveau de pessimisme encouru par les techniques probabilistes et comparé
cela avec la configuration du cache déterministe. La prédiction du WCET fournie par les
techniques PTA est plus proche de la durée d’exécution réelle du programme. Nous avons
comparé les estimations avec les mesures effectuées sur le processeur pour aider le concepteur
à évaluer le niveau de pessimisme introduit par l’architecture du cache pour chaque technique
d’analyse temporelle probabiliste. Ce travail fait une première tentative de comparaison des
analyses temporelles déterministes, statiques et de l’analyse temporelle probabiliste basée sur
des mesures pour l’estimation du temps d’execution sous différentes configurations de cache.
Nous avons identifié les points forts et les limites de chaque technique pour la prévision du
temps d’execution, puis nous avons fourni des directives pour la conception du processeur
qui minimisent le pessimisme associé au WCET. Nos expériences montrent que le cache répond
à toutes les conditions pour PTA et la prévision du programme peut être déterminée
avec une précision arbitraire. Une telle architecture probabiliste offre un potentiel inégalé et
prometteur pour les prochaines générations du CRTESs.
---------- ABSTRACT - Modern computer architectures are targeted towards speeding up the average performance
of software running on it. Architectural features like: deep pipelines, branch prediction, outof-order
execution, and multi-level memory hierarchies have an adverse impact on software
timing prediction. Particularly, it is hard or even impossible to make an accurate estimation
of the worst case execution-time (WCET) of a program or software running on a particular
hardware platform.
Critical real-time embedded systems (CRTESs), e.g. computing systems in aerospace
require strict timing constraints to guarantee their proper operational behavior. WCET
analysis is the central idea of the real-time systems development because real-time systems
always need to meet their deadlines. In order to meet the deadline requirements, WCET of
the real-time systems tasks must be determined, and this is only possible if the hardware
architecture is time-predictable. Due to the unpredictable nature of the modern computing
hardware, it is not practical to use advanced computing systems in CRTESs. The real-time
systems do not need to meet high-performance requirements. The processor designed to
improve average cases performance may not fit the requirements for the real-time systems
due to predictability issues.
Current timing analysis techniques are well established, but require detailed knowledge
of the internal operations and the state of the system for both hardware and software. Lack
of in-depth knowledge of the architectural operations become an obstacle for adopting the
deterministic timing analysis (DTA) techniques for WCET measurement. Probabilistic timing
analysis (PTA) is a technique that emerged for the timing analysis of the next-generation
real-time systems. The PTA techniques reduce the extent of knowledge of a software execution
platform that is needed to perform the accurate WCET estimations. In this thesis,
we propose the development of a new probabilistically analyzable cache and applied PTA
techniques for time-prediction. In this work, we implemented a randomized cache for MIPS-
32 and Leon-3 processors. We designed and implemented random placement and replacement
policies, and applied probabilistic timing techniques to measure probabilistic WCET
(pWCET). We also measured the level of pessimism incurred by the probabilistic techniques
and compared it with the deterministic cache configuration. The WCET prediction provided
by the PTA techniques is closer to the real execution-time of the program. We compared the
estimates with the measurements done on the processor to help the designer to evaluate the
level of pessimism introduced by the cache architecture for each probabilistic timing analysis
technique. This work makes a first attempt towards the comparison of deterministic, static,
and measurement-based probabilistic timing analysis for time-prediction under varying cache
configurations. We identify strengths and limitations of each technique for time- prediction,
and provide guidelines for the design of the processor that minimize the pessimism associated
with WCET. Our experiments show that the cache fulfills all the requirements for PTA and
program prediction can be determined with arbitrary accuracy. Such probabilistic computer
architecture carries unmatched potential and great promise for next generation CRTESs
Proceedings Work-In-Progress Session of the 13th Real-Time and Embedded Technology and Applications Symposium
The Work-In-Progress session of the 13th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS\u2707) presents papers describing contributions both to state of the art and state of the practice in the broad field of real-time and embedded systems. The 17 accepted papers were selected from 19 submissions. This proceedings is also available as Washington University in St. Louis Technical Report WUCSE-2007-17, at http://www.cse.seas.wustl.edu/Research/FileDownload.asp?733. Special thanks go to the General Chairs – Steve Goddard and Steve Liu and Program Chairs - Scott Brandt and Frank Mueller for their support and guidance
Distributed Control Architecture
This document describes the development and testing of a novel Distributed Control Architecture (DCA). The DCA developed during the study is an attempt to turn the components used to construct unmanned vehicles into a network of intelligent devices, connected using standard networking protocols. The architecture exists at both a hardware and software level and provides a communication channel between control modules, actuators and sensors.
A single unified mechanism for connecting sensors and actuators to the control software will reduce the technical knowledge required by platform integrators and allow control systems to be rapidly constructed in a Plug and Play manner. DCA uses standard networking hardware to connect components, removing the need for custom communication channels between individual sensors and actuators.
The use of a common architecture for the communication between components should make it easier for software to dynamically determine the vehicle s current capabilities and increase the range of processing platforms that can be utilised. Implementations of the architecture currently exist for Microsoft Windows, Windows Mobile 5, Linux and Microchip dsPIC30 microcontrollers.
Conceptually, DCA exposes the functionality of each networked device as objects with interfaces and associated methods. Allowing each object to expose multiple interfaces allows for future upgrades without breaking existing code. In addition, the use of common interfaces should help facilitate component reuse, unit testing and make it easier to write generic reusable software
A Multi-core processor for hard real-time systems
The increasing demand for new functionalities in current and future hard real-time embedded systems, like the ones deployed in automotive and avionics industries, is driving an increment in the performance required in current embedded processors. Multi-core processors represent a good design solution to cope with such higher performance requirements due to their better performance-per-watt ratio while maintaining the core design simple. Moreover, multi-cores also allow executing mixed-criticality level workloads composed of tasks with and without hard real-time requirements, maximizing the utilization of the hardware resources while guaranteeing low cost and low power consumption.
Despite those benefits, current multi-core processors are less analyzable than single-core ones due to the interferences between different tasks when accessing hardware shared resources. As a result, estimating a meaningful Worst-Case Execution Time (WCET) estimation - i.e. to compute an upper bound of the application's execution time - becomes extremely difficult, if not even impossible, because the execution time of a task may change depending on the other threads running at the same time. This makes the WCET of a task dependent on the set of inter-task interferences introduced by the co-running tasks.
Providing a WCET estimation independent from the other tasks (time composability property) is a key requirement in hard real-time systems.
This thesis proposes a new multi-core processor design in which time composability is achieved, hence enabling the use of multi-cores in hard real-time systems. With our proposals the WCET estimation of a HRT is independent from the other co-running tasks. To that end, we design a multi-core processor in which the maximum delay a request from a Hard Real-time Task (HRT), accessing a hardware shared resource can suffer due to other tasks is bounded: our processor guarantees that a request to a shared resource cannot be delayed longer than a given Upper Bound Delay (UBD).
In addition, the UBD allows identifying the impact that different processor configurations may have on the WCET by determining the sensitivity of a HRT to different resource allocations. This thesis proposes an off-line task allocation algorithm (called IA3: Interference-Aware Allocation Algorithm), that allocates tasks in a task set based on the HRT's sensitivity to different resource allocations. As a result the hardware shared resources used by HRTs are minimized, by allowing Non Hard Real-time Tasks (NHRTs) to use the rest of resources. Overall, our proposals provide analyzability for the HRTs allowing NHRTs to be executed into the same chip without any effect on the HRTs.
The previous first two proposals of this thesis focused on supporting the execution of multi-programmed workloads with mixed-criticality levels (composed of HRTs and NHRTs).
Higher performance could be achieved by implementing multi-threaded applications. As a first step towards supporting hard real-time parallel applications, this thesis proposes a new hardware/software approach to guarantee a predictable execution of software pipelined parallel programs.
This thesis also investigates a solution to verify the timing correctness of HRTs without requiring any modification in the core design: we design a hardware unit which is interfaced with the processor and integrated into a functional-safety aware methodology. This unit monitors the execution time of a block of instructions and it detects if it exceeds the WCET. Concretely, we show how to handle timing faults on a real industrial automotive platform.La creciente demanda de nuevas funcionalidades en los sistemas empotrados de tiempo real actuales y futuros en
industrias como la automovilÃstica y la de aviación, está impulsando un incremento en el rendimiento necesario en los
actuales procesadores empotrados. Los procesadores multi-núcleo son una solución eficiente para obtener un mayor
rendimiento ya que aumentan el rendimiento por vatio, manteniendo el diseño del núcleo simple.
Por otra parte, los procesadores multi-núcleo también permiten ejecutar cargas de trabajo con niveles de tiempo real mixtas
(formadas por tareas de tiempo real duro y laxo asà como tareas sin requerimientos de tiempo real), maximizando asà la
utilización de los recursos de procesador y garantizando el bajo consumo de energÃa.
Sin embargo, a pesar los beneficios mencionados anteriormente, los actuales procesadores multi-núcleo son menos
analizables que los de un solo núcleo debido a las interferencias surgidas cuando múltiples tareas acceden
simultáneamente a los recursos compartidos del procesador.
Como resultado, la estimación del peor tiempo de ejecución (conocido como WCET) - es decir, una cota superior del tiempo
de ejecución de la aplicación - se convierte en extremadamente difÃcil, si no imposible, porque el tiempo de ejecución de
una tarea puede cambiar dependiendo de las otras tareas que se estén ejecutando concurrentemente. Determinar una
estimación del WCET independiente de las otras tareas es un requisito clave en los sistemas empotrados de tiempo real
duro. Esta tesis propone un nuevo diseño de procesador multi-núcleo en el que el tiempo de ejecución de las tareas se
puede componer, lo que permitirá el uso de procesadores multi-núcleo en los sistemas de tiempo real duro. Para ello,
diseñamos un procesador multi-núcleo en el que la máxima demora que puede sufrir una petición de una tarea de tiempo
real duro (HRT) para acceder a un recurso hardware compartido debido a otras tareas está acotado, tiene un lÃmite superior
(UBD).
Además, UBD permite identificar el impacto que las diferentes posibles configuraciones del procesador pueden tener en el
WCET, mediante la determinación de la sensibilidad en la variación del tiempo de ejecución de diferentes reservas de
recursos del procesador. Esta tesis propone un algoritmo estático de reserva de recursos (llamado IA3), que asigna tareas
a núcleos en función de dicha sensibilidad. Como resultado los recursos compartidos del procesador usados por tareas
HRT se reducen al mÃnimo, permitiendo que las tareas sin requerimiento de tiempo real (NHRTs) puedas beneficiarse del
resto de recursos.
Por lo tanto, las propuestas presentadas en esta tesis permiten el análisis del WCET para tareas HRT, permitiendo asÃ
mismo la ejecución de tareas NHRTs en el mismo procesador multi-núcleo, sin que estas tengan ningún efecto sobre las
tareas HRT.
Las propuestas presentadas anteriormente se centran en el soporte a la ejecución de múltiples cargas de trabajo con
diferentes niveles de tiempo real (HRT y NHRTs).
Sin embargo, un mayor rendimiento puede lograrse mediante la transformación una tarea en múltiples sub-tareas
paralelas. Esta tesis propone una nueva técnica, con soporte del procesador y del sistema operativo, que garantiza una
ejecución analizable del modelo de ejecución paralela software pipelining.
Esta tesis también investiga una solución para verificar la corrección del WCET de HRT sin necesidad de ninguna
modificación en el diseño de la base: un nuevo componente externo al procesador se conecta a este sin necesidad de
modificarlo. Esta nueva unidad monitorea el tiempo de ejecución de un bloque de instrucciones y detecta si se excede el
WCET. Esta unidad permite detectar fallos de sincronización en sistemas de computación utilizados en automóviles
Inferring Complex Activities for Context-aware Systems within Smart Environments
The rising ageing population worldwide and the prevalence of age-related conditions such as physical fragility, mental impairments and chronic diseases have significantly impacted the quality of life and caused a shortage of health and care services. Over-stretched healthcare providers are leading to a paradigm shift in public healthcare provisioning. Thus, Ambient Assisted Living (AAL) using Smart Homes (SH) technologies has been rigorously investigated to help address the aforementioned problems.
Human Activity Recognition (HAR) is a critical component in AAL systems which enables applications such as just-in-time assistance, behaviour analysis, anomalies detection and emergency notifications. This thesis is aimed at investigating challenges faced in accurately recognising Activities of Daily Living (ADLs) performed by single or multiple inhabitants within smart environments. Specifically, this thesis explores five complementary research challenges in HAR. The first study contributes to knowledge by developing a semantic-enabled data segmentation approach with user-preferences. The second study takes the segmented set of sensor data to investigate and recognise human ADLs at multi-granular action level; coarse- and fine-grained action level. At the coarse-grained actions level, semantic relationships between the sensor, object and ADLs are deduced, whereas, at fine-grained action level, object usage at the satisfactory threshold with the evidence fused from multimodal sensor data is leveraged to verify the intended actions. Moreover, due to imprecise/vague interpretations of multimodal sensors and data fusion challenges, fuzzy set theory and fuzzy web ontology language (fuzzy-OWL) are leveraged. The third study focuses on incorporating uncertainties caused in HAR due to factors such as technological failure, object malfunction, and human errors. Hence, existing studies uncertainty theories and approaches are analysed and based on the findings, probabilistic ontology (PR-OWL) based HAR approach is proposed. The fourth study extends the first three studies to distinguish activities conducted by more than one inhabitant in a shared smart environment with the use of discriminative sensor-based techniques and time-series pattern analysis. The final study investigates in a suitable system architecture with a real-time smart environment tailored to AAL system and proposes microservices architecture with sensor-based off-the-shelf and bespoke sensing methods.
The initial semantic-enabled data segmentation study was evaluated with 100% and 97.8% accuracy to segment sensor events under single and mixed activities scenarios. However, the average classification time taken to segment each sensor events have suffered from 3971ms and 62183ms for single and mixed activities scenarios, respectively. The second study to detect fine-grained-level user actions was evaluated with 30 and 153 fuzzy rules to detect two fine-grained movements with a pre-collected dataset from the real-time smart environment. The result of the second study indicate good average accuracy of 83.33% and 100% but with the high average duration of 24648ms and 105318ms, and posing further challenges for the scalability of fusion rule creations. The third study was evaluated by incorporating PR-OWL ontology with ADL ontologies and Semantic-Sensor-Network (SSN) ontology to define four types of uncertainties presented in the kitchen-based activity. The fourth study illustrated a case study to extended single-user AR to multi-user AR by combining RFID tags and fingerprint sensors discriminative sensors to identify and associate user actions with the aid of time-series analysis. The last study responds to the computations and performance requirements for the four studies by analysing and proposing microservices-based system architecture for AAL system. A future research investigation towards adopting fog/edge computing paradigms from cloud computing is discussed for higher availability, reduced network traffic/energy, cost, and creating a decentralised system.
As a result of the five studies, this thesis develops a knowledge-driven framework to estimate and recognise multi-user activities at fine-grained level user actions. This framework integrates three complementary ontologies to conceptualise factual, fuzzy and uncertainties in the environment/ADLs, time-series analysis and discriminative sensing environment. Moreover, a distributed software architecture, multimodal sensor-based hardware prototypes, and other supportive utility tools such as simulator and synthetic ADL data generator for the experimentation were developed to support the evaluation of the proposed approaches. The distributed system is platform-independent and currently supported by an Android mobile application and web-browser based client interfaces for retrieving information such as live sensor events and HAR results
Flexible network management in software defined wireless sensor networks for monitoring application systems
Wireless Sensor Networks (WSNs) are the commonly applied information technologies of modern networking and computing platforms for application-specific systems. Today’s network computing applications are faced with high demand of reliable and powerful network functionalities. Hence, efficient network performance is central to the entire ecosystem, more especially where human life is a concern. However, effective management of WSNs remains a challenge due to problems supplemental to them. As a result, WSNs application systems such as in monitored environments, surveillance, aeronautics, medicine, processing and control, tend to suffer in terms of capacity to support compute intensive services due to limitations experienced on them. A recent technology shift proposes Software Defined Networking (SDN) for improving computing networks as well as enhancing network resource management, especially for life guarding systems. As an optimization strategy, a software-oriented approach for WSNs, known as Software Defined Wireless Sensor Network (SDWSN) is implemented to evolve, enhance and provide computing capacity to these resource constrained technologies.
Software developmental strategies are applied with the focus to ensure efficient network management, introduce network flexibility and advance network innovation towards the maximum operation potential for WSNs application systems. The need to develop WSNs application systems which are powerful and scalable has grown tremendously due to their simplicity in implementation and application. Their nature of design serves as a potential direction for the much anticipated and resource abundant IoT networks. Information systems such as data analytics, shared computing resources, control systems, big data support, visualizations, system audits, artificial intelligence (AI), etc. are a necessity to everyday life of consumers. Such systems can greatly benefit from the SDN programmability strategy, in terms of improving how data is mined, analysed and committed to other parts of the system for greater functionality. This work proposes and implements SDN strategies for enhancing WSNs application systems especially for life critical systems. It also highlights implementation considerations for designing powerful WSNs application systems by focusing on system critical aspects that should not be disregarded when planning to improve core network functionalities.
Due to their inherent challenges, WSN application systems lack robustness, reliability and scalability to support high computing demands. Anticipated systems must have greater capabilities to ubiquitously support many applications with flexible resources that can be easily accessed. To achieve this, such systems must incorporate powerful strategies for efficient data aggregation, query computations, communication and information presentation. The notion of applying machine learning methods to WSN systems is fairly new, though carries the potential to enhance WSN application technologies. This technological direction seeks to bring intelligent functionalities to WSN systems given the characteristics of wireless sensor nodes in terms of cooperative data transmission. With these technological aspects, a technical study is therefore conducted with a focus on WSN application systems as to how SDN strategies coupled with machine learning methods, can contribute with viable solutions on monitoring application systems to support and provide various applications and services with greater performance. To realize this, this work further proposes and implements machine learning (ML) methods coupled with SDN strategies to; enhance sensor data aggregation, introduce network flexibility, improve resource management, query processing and sensor information presentation. Hence, this work directly contributes to SDWSN strategies for monitoring application systems.Thesis (PhD)--University of Pretoria, 2018.National Research Foundation (NRF)Telkom Centre of ExcellenceElectrical, Electronic and Computer EngineeringPhDUnrestricte
Energy-aware medium access control protocols for wireless sensors network applications
The main purpose of this thesis was to investigate energy efficient Medium Access Control (MAC) protocols designed to extend the lifetime of a wireless sensor network application, such as tracking, environment monitoring, home security, patient monitoring, e.g., foetal monitoring in the last weeks of pregnancy. From the perspective of communication protocols, energy efficiency is one of the most important issues, and can be addressed at each layer of the protocol stack; however, our research only focuses on the medium access control (MAC) layer. An energy efficient MAC protocol was designed based on modifications and optimisations for a synchronized power saving Sensor MAC (SMAC)
protocol, which has three important components: periodic listen and sleep, collision and overhearing avoidance and message passing. The Sensor Block Acknowledgement (SBACK) MAC protocol is proposed, which combines contention-based, scheduling-based and block acknowledgement-based schemes to achieve energy efficiency. In SBACK, the use of ACK control packets is reduced since it will not have an ACK packet for every DATA packet sent; instead, one
special packet called Block ACK Response will be used at the end of the transmission of all data packets. This packet informs the sender of how many packets were received by the receiver, reducing the number of ACK control packets we intended to reduce the power consumption for the nodes.
Hence more useful data packets can be transmitted. A comparison study between SBACK and SMAC protocol is also performed.
Considering 0% of packet losses, SBACK decreases the energy consumption when directly compared with S-MAC, we will have always a decrease of energy consumption.
Three different transceivers will be used and considering a packet loss of 10% we will have a decrease of energy consumption between 10% and 0.1% depending on the transceiver.
When there are no retransmissions of packets, SBACK only achieve worst performance when the
number of fragments is less than 12, after that the decrease of average delay increases with the increase of the fragments sent.
When 10% of the packets need retransmission only for the TR1000 transceiver worst results occurs in terms of energy waste, all other transceivers (CC2420 and AT86RF230) achieve better results.
In terms of delay if we need to retransmit more than 10 packets the SBACK protocol always achieves better performance when comparing with the other MAC protocols that uses ACK
Programmer-transparent efficient parallelism with skeletons
Parallel and heterogeneous systems are ubiquitous. Unfortunately, both require significant complexity at the software level to the detriment of programmer productivity. To
produce correct and efficient code programmers not only have to manage synchronisation and communication but also be aware of low-level hardware details. It is foresee able that the problem is becoming worse because systems are increasingly parallel and
heterogeneous.
Building on earlier work, this thesis further investigates the contribution which
algorithmic skeletons can make towards solving this problem. Skeletons are high-level
abstractions for typical parallel computations. They hide low-level hardware details
from programmers and, in addition, encode information about the computations that
they implement, which runtime systems and library developers can use for automatic
optimisations. We present two novel case studies in this respect.
First, we provide scheduling flexibility on heterogeneous CPU + GPU systems in
a programmer transparent way similar to the freedom OS schedulers have on CPUs.
Thanks to the high-level nature of skeletons we automatically switch between CPU and
GPU implementations of kernels and use semantic information encoded in skeletons to
find execution time points at which switches can occur. In more detail, kernel iteration
spaces are processed in slices and migration is considered on a slice-by-slice basis. We
show that slice sizes choices that introduce negligible overheads can be learned by predictive models. We show that in a simple deployment scenario mid-kernel migration
achieves speedups of up to 1.30x and 1.08x on average. Our mechanism introduces
negligible overheads of 2.34% if a kernel does not actually migrate.
Second, we propose skeletons to simplify the programming of parallel hard real-time systems. We combine information encoded in task farms with real-time systems
user code analysis to automatically choose thread counts and an optimisation parameter
related to farm internal communication. Both parameters are chosen so that real-time
deadlines are met with minimum resource usage. We show that our approach achieves
1.22x speedup over unoptimised code, selects the best parameter settings in 83% of
cases, and never chooses parameters that cause deadline misses