Search CORE

39 research outputs found

An integrated soft- and hard-programmable multithreaded architecture

Author: Zhong Shi
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

A Probabilistically Analyzable Cache to Estimate Timing Bounds

Author: Anwar Hassan
Publication venue
Publication date: 01/06/2016
Field of study

RÉSUMÉ - Les architectures informatiques modernes cherchent à accélérer la performance moyenne des logiciels en cours d’exécution. Les caractéristiques architecturales comme : deep pipelines, prédiction de branchement, exécution hors ordre, et hiérarchie des mémoire à multiple niveaux ont un impact négatif sur le logiciel de prédiction temporelle. En particulier, il est difficile, voire impossible, de faire une estimation précise du pire cas de temps d’exécution (WCET) d’un programme ou d’un logiciel en cours d’exécution sur une plateforme informatique particulière. Les systèmes embarqués critiques temps réel (CRTESs), par exemple les systèmes informatiques dans le domaine aérospatiale, exigent des contraintes de temps strictes pour garantir leur fonctionnement opérationnel. L’analyse du WCET est l’idée centrale du développement des systèmes temps réel puisque les systèmes temps réel ont toujours besoin de respecter leurs échéances. Afin de répondre aux exigences du délai, le WCET des tâches des systèmes temps réel doivent être déterminées, et cela est seulement possible si l’architecture informatique est temporellement prévisible. En raison de la nature imprévisible des systems informatiques modernes, il est peu pratique d’utiliser des systèmes informatiques avancés dans les CRTESs. En temps réel, les systèmes ne doivent pas répondre aux exigences de haute performance. Les processeurs conçus pour améliorer la performance des systèmes informatiques en général peuvent ne pas être compatibles avec les exigences pour les systèmes temps réel en raison de problèmes de prédictabilité. Les techniques d’analyse temporelle actuelles sont bien établies, mais nécessitent une connaissance détaillée des opérations internes et de l’état du système pour le matériel et le logiciel. Le manque de connaissances approfondies des opérations architecturales devient un obstacle à l’adoption de techniques déterministes de l’analyse temporelle (DTA) pour mesurer le WCET. Les techniques probabilistes de l’analyse temporelle (PTA) ont, quant à elles, émergé comme les techniques d’analyse temporelle pour la prochaine génération de systèmes temps réel. Les techniques PTA réduisent l’étendue des connaissances nécessaires pour l’exécution d’un logiciel informatique afin d’effectuer des estimations précises du WCET. Dans cette thèse, nous proposons le développement d’une nouvelle technique pour un cache probabilistiquement analysable, tout en appliquant les techniques PTA pour prédire le temps d’exécution d’un logiciel. Dans ce travail, nous avons mis en place une cache aléatoire pour les processeurs MIPS-32 et Leon-3. Nous avons conçu et mis en œuvre les politiques de placement et remplacement aléatoire et appliquer des techniques temporelles probabilistiques pour mesurer le WCET probabiliste (pWCET). Nous avons également mesuré le niveau de pessimisme encouru par les techniques probabilistes et comparé cela avec la configuration du cache déterministe. La prédiction du WCET fournie par les techniques PTA est plus proche de la durée d’exécution réelle du programme. Nous avons comparé les estimations avec les mesures effectuées sur le processeur pour aider le concepteur à évaluer le niveau de pessimisme introduit par l’architecture du cache pour chaque technique d’analyse temporelle probabiliste. Ce travail fait une première tentative de comparaison des analyses temporelles déterministes, statiques et de l’analyse temporelle probabiliste basée sur des mesures pour l’estimation du temps d’execution sous différentes configurations de cache. Nous avons identifié les points forts et les limites de chaque technique pour la prévision du temps d’execution, puis nous avons fourni des directives pour la conception du processeur qui minimisent le pessimisme associé au WCET. Nos expériences montrent que le cache répond à toutes les conditions pour PTA et la prévision du programme peut être déterminée avec une précision arbitraire. Une telle architecture probabiliste offre un potentiel inégalé et prometteur pour les prochaines générations du CRTESs. ---------- ABSTRACT - Modern computer architectures are targeted towards speeding up the average performance of software running on it. Architectural features like: deep pipelines, branch prediction, outof-order execution, and multi-level memory hierarchies have an adverse impact on software timing prediction. Particularly, it is hard or even impossible to make an accurate estimation of the worst case execution-time (WCET) of a program or software running on a particular hardware platform. Critical real-time embedded systems (CRTESs), e.g. computing systems in aerospace require strict timing constraints to guarantee their proper operational behavior. WCET analysis is the central idea of the real-time systems development because real-time systems always need to meet their deadlines. In order to meet the deadline requirements, WCET of the real-time systems tasks must be determined, and this is only possible if the hardware architecture is time-predictable. Due to the unpredictable nature of the modern computing hardware, it is not practical to use advanced computing systems in CRTESs. The real-time systems do not need to meet high-performance requirements. The processor designed to improve average cases performance may not fit the requirements for the real-time systems due to predictability issues. Current timing analysis techniques are well established, but require detailed knowledge of the internal operations and the state of the system for both hardware and software. Lack of in-depth knowledge of the architectural operations become an obstacle for adopting the deterministic timing analysis (DTA) techniques for WCET measurement. Probabilistic timing analysis (PTA) is a technique that emerged for the timing analysis of the next-generation real-time systems. The PTA techniques reduce the extent of knowledge of a software execution platform that is needed to perform the accurate WCET estimations. In this thesis, we propose the development of a new probabilistically analyzable cache and applied PTA techniques for time-prediction. In this work, we implemented a randomized cache for MIPS- 32 and Leon-3 processors. We designed and implemented random placement and replacement policies, and applied probabilistic timing techniques to measure probabilistic WCET (pWCET). We also measured the level of pessimism incurred by the probabilistic techniques and compared it with the deterministic cache configuration. The WCET prediction provided by the PTA techniques is closer to the real execution-time of the program. We compared the estimates with the measurements done on the processor to help the designer to evaluate the level of pessimism introduced by the cache architecture for each probabilistic timing analysis technique. This work makes a first attempt towards the comparison of deterministic, static, and measurement-based probabilistic timing analysis for time-prediction under varying cache configurations. We identify strengths and limitations of each technique for time- prediction, and provide guidelines for the design of the processor that minimize the pessimism associated with WCET. Our experiments show that the cache fulfills all the requirements for PTA and program prediction can be determined with arbitrary accuracy. Such probabilistic computer architecture carries unmatched potential and great promise for next generation CRTESs

PolyPublie

Proceedings Work-In-Progress Session of the 13th Real-Time and Embedded Technology and Applications Symposium

Author: Lu Chenyang
Publication venue: Washington University Open Scholarship
Publication date: 03/04/2007
Field of study

The Work-In-Progress session of the 13th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS\u2707) presents papers describing contributions both to state of the art and state of the practice in the broad field of real-time and embedded systems. The 17 accepted papers were selected from 19 submissions. This proceedings is also available as Washington University in St. Louis Technical Report WUCSE-2007-17, at http://www.cse.seas.wustl.edu/Research/FileDownload.asp?733. Special thanks go to the General Chairs – Steve Goddard and Steve Liu and Program Chairs - Scott Brandt and Frank Mueller for their support and guidance

Washington University St. Louis: Open Scholarship

Distributed Control Architecture

Author: Trevor Rawlings (7202609)
Publication venue
Publication date: 01/01/2009
Field of study

This document describes the development and testing of a novel Distributed Control Architecture (DCA). The DCA developed during the study is an attempt to turn the components used to construct unmanned vehicles into a network of intelligent devices, connected using standard networking protocols. The architecture exists at both a hardware and software level and provides a communication channel between control modules, actuators and sensors. A single unified mechanism for connecting sensors and actuators to the control software will reduce the technical knowledge required by platform integrators and allow control systems to be rapidly constructed in a Plug and Play manner. DCA uses standard networking hardware to connect components, removing the need for custom communication channels between individual sensors and actuators. The use of a common architecture for the communication between components should make it easier for software to dynamically determine the vehicle s current capabilities and increase the range of processing platforms that can be utilised. Implementations of the architecture currently exist for Microsoft Windows, Windows Mobile 5, Linux and Microchip dsPIC30 microcontrollers. Conceptually, DCA exposes the functionality of each networked device as objects with interfaces and associated methods. Allowing each object to expose multiple interfaces allows for future upgrades without breaking existing code. In addition, the use of common interfaces should help facilitate component reuse, unit testing and make it easier to write generic reusable software

Loughborough University Institutional Repository

A Multi-core processor for hard real-time systems

Author: Paolieri Marco
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

The increasing demand for new functionalities in current and future hard real-time embedded systems, like the ones deployed in automotive and avionics industries, is driving an increment in the performance required in current embedded processors. Multi-core processors represent a good design solution to cope with such higher performance requirements due to their better performance-per-watt ratio while maintaining the core design simple. Moreover, multi-cores also allow executing mixed-criticality level workloads composed of tasks with and without hard real-time requirements, maximizing the utilization of the hardware resources while guaranteeing low cost and low power consumption. Despite those benefits, current multi-core processors are less analyzable than single-core ones due to the interferences between different tasks when accessing hardware shared resources. As a result, estimating a meaningful Worst-Case Execution Time (WCET) estimation - i.e. to compute an upper bound of the application's execution time - becomes extremely difficult, if not even impossible, because the execution time of a task may change depending on the other threads running at the same time. This makes the WCET of a task dependent on the set of inter-task interferences introduced by the co-running tasks. Providing a WCET estimation independent from the other tasks (time composability property) is a key requirement in hard real-time systems. This thesis proposes a new multi-core processor design in which time composability is achieved, hence enabling the use of multi-cores in hard real-time systems. With our proposals the WCET estimation of a HRT is independent from the other co-running tasks. To that end, we design a multi-core processor in which the maximum delay a request from a Hard Real-time Task (HRT), accessing a hardware shared resource can suffer due to other tasks is bounded: our processor guarantees that a request to a shared resource cannot be delayed longer than a given Upper Bound Delay (UBD). In addition, the UBD allows identifying the impact that different processor configurations may have on the WCET by determining the sensitivity of a HRT to different resource allocations. This thesis proposes an off-line task allocation algorithm (called IA3: Interference-Aware Allocation Algorithm), that allocates tasks in a task set based on the HRT's sensitivity to different resource allocations. As a result the hardware shared resources used by HRTs are minimized, by allowing Non Hard Real-time Tasks (NHRTs) to use the rest of resources. Overall, our proposals provide analyzability for the HRTs allowing NHRTs to be executed into the same chip without any effect on the HRTs. The previous first two proposals of this thesis focused on supporting the execution of multi-programmed workloads with mixed-criticality levels (composed of HRTs and NHRTs). Higher performance could be achieved by implementing multi-threaded applications. As a first step towards supporting hard real-time parallel applications, this thesis proposes a new hardware/software approach to guarantee a predictable execution of software pipelined parallel programs. This thesis also investigates a solution to verify the timing correctness of HRTs without requiring any modification in the core design: we design a hardware unit which is interfaced with the processor and integrated into a functional-safety aware methodology. This unit monitors the execution time of a block of instructions and it detects if it exceeds the WCET. Concretely, we show how to handle timing faults on a real industrial automotive platform.La creciente demanda de nuevas funcionalidades en los sistemas empotrados de tiempo real actuales y futuros en industrias como la automovilística y la de aviación, está impulsando un incremento en el rendimiento necesario en los actuales procesadores empotrados. Los procesadores multi-núcleo son una solución eficiente para obtener un mayor rendimiento ya que aumentan el rendimiento por vatio, manteniendo el diseño del núcleo simple. Por otra parte, los procesadores multi-núcleo también permiten ejecutar cargas de trabajo con niveles de tiempo real mixtas (formadas por tareas de tiempo real duro y laxo así como tareas sin requerimientos de tiempo real), maximizando así la utilización de los recursos de procesador y garantizando el bajo consumo de energía. Sin embargo, a pesar los beneficios mencionados anteriormente, los actuales procesadores multi-núcleo son menos analizables que los de un solo núcleo debido a las interferencias surgidas cuando múltiples tareas acceden simultáneamente a los recursos compartidos del procesador. Como resultado, la estimación del peor tiempo de ejecución (conocido como WCET) - es decir, una cota superior del tiempo de ejecución de la aplicación - se convierte en extremadamente difícil, si no imposible, porque el tiempo de ejecución de una tarea puede cambiar dependiendo de las otras tareas que se estén ejecutando concurrentemente. Determinar una estimación del WCET independiente de las otras tareas es un requisito clave en los sistemas empotrados de tiempo real duro. Esta tesis propone un nuevo diseño de procesador multi-núcleo en el que el tiempo de ejecución de las tareas se puede componer, lo que permitirá el uso de procesadores multi-núcleo en los sistemas de tiempo real duro. Para ello, diseñamos un procesador multi-núcleo en el que la máxima demora que puede sufrir una petición de una tarea de tiempo real duro (HRT) para acceder a un recurso hardware compartido debido a otras tareas está acotado, tiene un límite superior (UBD). Además, UBD permite identificar el impacto que las diferentes posibles configuraciones del procesador pueden tener en el WCET, mediante la determinación de la sensibilidad en la variación del tiempo de ejecución de diferentes reservas de recursos del procesador. Esta tesis propone un algoritmo estático de reserva de recursos (llamado IA3), que asigna tareas a núcleos en función de dicha sensibilidad. Como resultado los recursos compartidos del procesador usados por tareas HRT se reducen al mínimo, permitiendo que las tareas sin requerimiento de tiempo real (NHRTs) puedas beneficiarse del resto de recursos. Por lo tanto, las propuestas presentadas en esta tesis permiten el análisis del WCET para tareas HRT, permitiendo así mismo la ejecución de tareas NHRTs en el mismo procesador multi-núcleo, sin que estas tengan ningún efecto sobre las tareas HRT. Las propuestas presentadas anteriormente se centran en el soporte a la ejecución de múltiples cargas de trabajo con diferentes niveles de tiempo real (HRT y NHRTs). Sin embargo, un mayor rendimiento puede lograrse mediante la transformación una tarea en múltiples sub-tareas paralelas. Esta tesis propone una nueva técnica, con soporte del procesador y del sistema operativo, que garantiza una ejecución analizable del modelo de ejecución paralela software pipelining. Esta tesis también investiga una solución para verificar la corrección del WCET de HRT sin necesidad de ninguna modificación en el diseño de la base: un nuevo componente externo al procesador se conecta a este sin necesidad de modificarlo. Esta nueva unidad monitorea el tiempo de ejecución de un bloque de instrucciones y detecta si se excede el WCET. Esta unidad permite detectar fallos de sincronización en sistemas de computación utilizados en automóviles

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Endocrine Inspired Control of Wireless Sensor Networks:Deployment and Analysis

Author: Blanchard Tom
Publication venue
Publication date: 20/05/2016
Field of study

Aberystwyth Research Portal

Inferring Complex Activities for Context-aware Systems within Smart Environments

Author: Triboan Darpan
Publication venue: Faculty of Computing, Engineering and Media
Publication date: 01/04/2020
Field of study

The rising ageing population worldwide and the prevalence of age-related conditions such as physical fragility, mental impairments and chronic diseases have significantly impacted the quality of life and caused a shortage of health and care services. Over-stretched healthcare providers are leading to a paradigm shift in public healthcare provisioning. Thus, Ambient Assisted Living (AAL) using Smart Homes (SH) technologies has been rigorously investigated to help address the aforementioned problems. Human Activity Recognition (HAR) is a critical component in AAL systems which enables applications such as just-in-time assistance, behaviour analysis, anomalies detection and emergency notifications. This thesis is aimed at investigating challenges faced in accurately recognising Activities of Daily Living (ADLs) performed by single or multiple inhabitants within smart environments. Specifically, this thesis explores five complementary research challenges in HAR. The first study contributes to knowledge by developing a semantic-enabled data segmentation approach with user-preferences. The second study takes the segmented set of sensor data to investigate and recognise human ADLs at multi-granular action level; coarse- and fine-grained action level. At the coarse-grained actions level, semantic relationships between the sensor, object and ADLs are deduced, whereas, at fine-grained action level, object usage at the satisfactory threshold with the evidence fused from multimodal sensor data is leveraged to verify the intended actions. Moreover, due to imprecise/vague interpretations of multimodal sensors and data fusion challenges, fuzzy set theory and fuzzy web ontology language (fuzzy-OWL) are leveraged. The third study focuses on incorporating uncertainties caused in HAR due to factors such as technological failure, object malfunction, and human errors. Hence, existing studies uncertainty theories and approaches are analysed and based on the findings, probabilistic ontology (PR-OWL) based HAR approach is proposed. The fourth study extends the first three studies to distinguish activities conducted by more than one inhabitant in a shared smart environment with the use of discriminative sensor-based techniques and time-series pattern analysis. The final study investigates in a suitable system architecture with a real-time smart environment tailored to AAL system and proposes microservices architecture with sensor-based off-the-shelf and bespoke sensing methods. The initial semantic-enabled data segmentation study was evaluated with 100% and 97.8% accuracy to segment sensor events under single and mixed activities scenarios. However, the average classification time taken to segment each sensor events have suffered from 3971ms and 62183ms for single and mixed activities scenarios, respectively. The second study to detect fine-grained-level user actions was evaluated with 30 and 153 fuzzy rules to detect two fine-grained movements with a pre-collected dataset from the real-time smart environment. The result of the second study indicate good average accuracy of 83.33% and 100% but with the high average duration of 24648ms and 105318ms, and posing further challenges for the scalability of fusion rule creations. The third study was evaluated by incorporating PR-OWL ontology with ADL ontologies and Semantic-Sensor-Network (SSN) ontology to define four types of uncertainties presented in the kitchen-based activity. The fourth study illustrated a case study to extended single-user AR to multi-user AR by combining RFID tags and fingerprint sensors discriminative sensors to identify and associate user actions with the aid of time-series analysis. The last study responds to the computations and performance requirements for the four studies by analysing and proposing microservices-based system architecture for AAL system. A future research investigation towards adopting fog/edge computing paradigms from cloud computing is discussed for higher availability, reduced network traffic/energy, cost, and creating a decentralised system. As a result of the five studies, this thesis develops a knowledge-driven framework to estimate and recognise multi-user activities at fine-grained level user actions. This framework integrates three complementary ontologies to conceptualise factual, fuzzy and uncertainties in the environment/ADLs, time-series analysis and discriminative sensing environment. Moreover, a distributed software architecture, multimodal sensor-based hardware prototypes, and other supportive utility tools such as simulator and synthetic ADL data generator for the experimentation were developed to support the evaluation of the proposed approaches. The distributed system is platform-independent and currently supported by an Android mobile application and web-browser based client interfaces for retrieving information such as live sensor events and HAR results

De Montfort University Open Research Archive

Flexible network management in software defined wireless sensor networks for monitoring application systems

Author: Modieginyane Kgotlaetsile Mathews
Publication venue: 'University of Pretoria - Department of Philosophy'
Publication date: 01/02/2018
Field of study

Wireless Sensor Networks (WSNs) are the commonly applied information technologies of modern networking and computing platforms for application-specific systems. Today’s network computing applications are faced with high demand of reliable and powerful network functionalities. Hence, efficient network performance is central to the entire ecosystem, more especially where human life is a concern. However, effective management of WSNs remains a challenge due to problems supplemental to them. As a result, WSNs application systems such as in monitored environments, surveillance, aeronautics, medicine, processing and control, tend to suffer in terms of capacity to support compute intensive services due to limitations experienced on them. A recent technology shift proposes Software Defined Networking (SDN) for improving computing networks as well as enhancing network resource management, especially for life guarding systems. As an optimization strategy, a software-oriented approach for WSNs, known as Software Defined Wireless Sensor Network (SDWSN) is implemented to evolve, enhance and provide computing capacity to these resource constrained technologies. Software developmental strategies are applied with the focus to ensure efficient network management, introduce network flexibility and advance network innovation towards the maximum operation potential for WSNs application systems. The need to develop WSNs application systems which are powerful and scalable has grown tremendously due to their simplicity in implementation and application. Their nature of design serves as a potential direction for the much anticipated and resource abundant IoT networks. Information systems such as data analytics, shared computing resources, control systems, big data support, visualizations, system audits, artificial intelligence (AI), etc. are a necessity to everyday life of consumers. Such systems can greatly benefit from the SDN programmability strategy, in terms of improving how data is mined, analysed and committed to other parts of the system for greater functionality. This work proposes and implements SDN strategies for enhancing WSNs application systems especially for life critical systems. It also highlights implementation considerations for designing powerful WSNs application systems by focusing on system critical aspects that should not be disregarded when planning to improve core network functionalities. Due to their inherent challenges, WSN application systems lack robustness, reliability and scalability to support high computing demands. Anticipated systems must have greater capabilities to ubiquitously support many applications with flexible resources that can be easily accessed. To achieve this, such systems must incorporate powerful strategies for efficient data aggregation, query computations, communication and information presentation. The notion of applying machine learning methods to WSN systems is fairly new, though carries the potential to enhance WSN application technologies. This technological direction seeks to bring intelligent functionalities to WSN systems given the characteristics of wireless sensor nodes in terms of cooperative data transmission. With these technological aspects, a technical study is therefore conducted with a focus on WSN application systems as to how SDN strategies coupled with machine learning methods, can contribute with viable solutions on monitoring application systems to support and provide various applications and services with greater performance. To realize this, this work further proposes and implements machine learning (ML) methods coupled with SDN strategies to; enhance sensor data aggregation, introduce network flexibility, improve resource management, query processing and sensor information presentation. Hence, this work directly contributes to SDWSN strategies for monitoring application systems.Thesis (PhD)--University of Pretoria, 2018.National Research Foundation (NRF)Telkom Centre of ExcellenceElectrical, Electronic and Computer EngineeringPhDUnrestricte

UPSpace at the University of Pretoria

Energy-aware medium access control protocols for wireless sensors network applications

Author: Barroca Norberto José Gil
Publication venue
Publication date: 01/01/2009
Field of study

The main purpose of this thesis was to investigate energy efficient Medium Access Control (MAC) protocols designed to extend the lifetime of a wireless sensor network application, such as tracking, environment monitoring, home security, patient monitoring, e.g., foetal monitoring in the last weeks of pregnancy. From the perspective of communication protocols, energy efficiency is one of the most important issues, and can be addressed at each layer of the protocol stack; however, our research only focuses on the medium access control (MAC) layer. An energy efficient MAC protocol was designed based on modifications and optimisations for a synchronized power saving Sensor MAC (SMAC) protocol, which has three important components: periodic listen and sleep, collision and overhearing avoidance and message passing. The Sensor Block Acknowledgement (SBACK) MAC protocol is proposed, which combines contention-based, scheduling-based and block acknowledgement-based schemes to achieve energy efficiency. In SBACK, the use of ACK control packets is reduced since it will not have an ACK packet for every DATA packet sent; instead, one special packet called Block ACK Response will be used at the end of the transmission of all data packets. This packet informs the sender of how many packets were received by the receiver, reducing the number of ACK control packets we intended to reduce the power consumption for the nodes. Hence more useful data packets can be transmitted. A comparison study between SBACK and SMAC protocol is also performed. Considering 0% of packet losses, SBACK decreases the energy consumption when directly compared with S-MAC, we will have always a decrease of energy consumption. Three different transceivers will be used and considering a packet loss of 10% we will have a decrease of energy consumption between 10% and 0.1% depending on the transceiver. When there are no retransmissions of packets, SBACK only achieve worst performance when the number of fragments is less than 12, after that the decrease of average delay increases with the increase of the fragments sent. When 10% of the packets need retransmission only for the TR1000 transceiver worst results occurs in terms of energy waste, all other transceivers (CC2420 and AT86RF230) achieve better results. In terms of delay if we need to retransmit more than 10 packets the SBACK protocol always achieves better performance when comparing with the other MAC protocols that uses ACK

UBibliorum repositorio digital da ubi

Programmer-transparent efficient parallelism with skeletons

Author: Metzger Paul
Publication venue: The University of Edinburgh
Publication date: 30/11/2021
Field of study

Parallel and heterogeneous systems are ubiquitous. Unfortunately, both require significant complexity at the software level to the detriment of programmer productivity. To produce correct and efficient code programmers not only have to manage synchronisation and communication but also be aware of low-level hardware details. It is foresee able that the problem is becoming worse because systems are increasingly parallel and heterogeneous. Building on earlier work, this thesis further investigates the contribution which algorithmic skeletons can make towards solving this problem. Skeletons are high-level abstractions for typical parallel computations. They hide low-level hardware details from programmers and, in addition, encode information about the computations that they implement, which runtime systems and library developers can use for automatic optimisations. We present two novel case studies in this respect. First, we provide scheduling flexibility on heterogeneous CPU + GPU systems in a programmer transparent way similar to the freedom OS schedulers have on CPUs. Thanks to the high-level nature of skeletons we automatically switch between CPU and GPU implementations of kernels and use semantic information encoded in skeletons to find execution time points at which switches can occur. In more detail, kernel iteration spaces are processed in slices and migration is considered on a slice-by-slice basis. We show that slice sizes choices that introduce negligible overheads can be learned by predictive models. We show that in a simple deployment scenario mid-kernel migration achieves speedups of up to 1.30x and 1.08x on average. Our mechanism introduces negligible overheads of 2.34% if a kernel does not actually migrate. Second, we propose skeletons to simplify the programming of parallel hard real-time systems. We combine information encoded in task farms with real-time systems user code analysis to automatically choose thread counts and an optimisation parameter related to farm internal communication. Both parameters are chosen so that real-time deadlines are met with minimum resource usage. We show that our approach achieves 1.22x speedup over unoptimised code, selects the best parameter settings in 83% of cases, and never chooses parameters that cause deadline misses

Edinburgh Research Archive