135 research outputs found

    A Survey of Fault-Tolerance Techniques for Embedded Systems from the Perspective of Power, Energy, and Thermal Issues

    Get PDF
    The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systems’ design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones

    Study, analysis and new scheduling proposals in partitioned real-time systems

    Full text link
    [ES] En nuestra vida cotidiana, cada vez más ordenadores controlan nuestro entorno: teléfonos móviles, procesos industriales, asistencia a la conducción, etc. Todos estos sistemas presentan requisitos estrictos para garantizar un comportamiento adecuado. En muchos de estos sistemas, cumplir con las restricciones de tiempo es un factor tan importante como el resultado lógico de los cálculos. Desde hace aproximadamente 40 años, los sistemas en tiempo real son muy atractivos en el campo de la computación y hoy en día se aplican en áreas de gran alcance como aplicaciones industriales, aplicaciones aeroespaciales, telecomunicaciones, electrónica de consumo, etc. Algunos retos a abordar en el campo del tiempo real son el determinismo y la predecibilidad del comportamiento temporal del sistema. En este sentido, garantizar la ejecución del programa y los tiempos de respuesta del sistema son requisitos esenciales que deben cumplirse estrictamente a través de estrategias apropiadas de planificación de tareas. Además, las arquitecturas multiprocesador se están volviendo más populares debido al hecho de que las capacidades de procesamiento y los recursos computacionales de los sistemas están aumentando. Un estudio reciente estima que existe una tendencia creciente entre las arquitecturas multiprocesador a combinar diferentes niveles de criticidad en el mismo sistema. En este sentido, proporcionar aislamiento entre las aplicaciones es extremadamente necesario. La tecnología particionada es capaz de lidiar con este propósito. Además, la gestión de la energía es un problema relevante en los sistemas en tiempo real. Muchos sistemas empotrados de tiempo real, como dispositivos portátiles o robots móviles que requieren baterías, buscan encontrar técnicas que reduzcan el consumo de energía y, como consecuencia, aumenten la vida útil de sus baterías. También se obtienen claros beneficios operativos, financieros, monetarios y ambientales al minimizar el consumo de energía. Con todo ello, este trabajo aborda el problema de planificabilidad y contribuye al estudio de las nuevas técnicas de planificación en sistemas particionados de tiempo real. Estas técnicas proporcionan el tiempo mínimo para planificar de manera factible conjuntos de tareas. Además, se proponen técnicas de asignación para sistemas multiprocesador cuyo objetivo principal es reducir el consumo de energía del sistema global. Finalmente, se presentan los resultados obtenidos así como los trabajos futuros relacionados con este trabajo[CA] En la nostra vida quotidiana, cada vegada més ordenadors controlen el nostre entorn: telèfons mòbils, processos industrials, assistència a la conducció, etc. Tots aquests sistemes presenten requisits estrictes per a garantir un comportament adequat. En molts d' aquests sistemes, complir amb les restriccions de temps és un factor tan important com el resultat lògic dels càlculs. Des de fa aproximadament 40 anys, els sistemes en temps real són molt atractius en el camp de la computació i hui dia s' apliquen en àrees de gran abast com a aplicacions industrials, aplicacions aeroespacials, telecomunicacions, electrònica de consum, etc. Alguns reptes a abordar en el camp del temps real són el determinisme i la predictibilitat del comportament temporal del sistema. En aquest sentit, garantir l'execució del programa i els temps de resposta del sistema són requisits essencials que han de complir-se estrictament a través d'estratègies apropiades de planificació de tasques. A més, les arquitectures multiprocessador s'estan tornant més populars a causa del fet que les capacitats de processament i els recursos computacionals dels sistemes estan augmentant. Un estudi recent estima que existeix una tendència creixent entre les arquitectures multiprocessador a combinar diferents nivells de criticitat en el mateix sistema. En aquest sentit, proporcionar aïllament entre les aplicacions és extremadament necessari. La tecnologia particionada és capaç de bregar amb aquest propòsit. A més, la gestió de l'energia és un problema rellevant en els sistemes en temps real. Molts sistemes embebits de temps real, com a dispositius portàtils o robots mòbils que requereixen bateries, busquen trobar tècniques que reduïsquen el consum d'energia i, com a conseqüència, augmenten la vida útil de les seues bateries. També s'obtenen clars beneficis operatius, financers, monetaris i ambientals en minimitzar el consum d'energia. Amb tot això, aquest treball aborda el problema de planificabilitat i contribueix a l'estudi de les noves tècniques de planificació en sistemes particionats de temps real. Aquestes tècniques proporcionen el temps mínim per a planificar de manera factible conjunts de tasques. A més, es proposen tècniques d'assignació per a sistemes multiprocessador l'objectiu principal del qual és reduir el consum d'energia del sistema global. Finalment, es presenten els resultats obtinguts així com els treballs futurs relacionats amb aquest treball.[EN] In our everyday lives, more and more computers are controlling our environment: mobile phones, industrial processes, driving assistance, etc. All these systems present strict requirements to ensure proper behaviour. In many of these systems, the time at which the action is delivered is as important as the logical result of the computation. About 40 years ago, real-time systems began to attract attention in computing field and nowadays are applied in wide ranging areas as industrial applications, aerospace, telecommunication applications, consumer electronics, etc. Some real-time challenges that must be addressed are determinism and predictability of the temporal behaviour of the system. In this sense, to guarantee program execution and system response times are essential requirements that must be strictly met through appropriate task scheduling strategies. Furthermore, multiprocessor architectures are becoming more popular due to the fact that processing capabilities and computational resources are increasing. A recent study estimates that there is an increasing tendency among multiprocessor architectures to combine different levels of criticality in the same system. In this sense, to provide isolation between applications is extremely required. Partitioned technology is able to deal with this purpose. In addition, energy management is a relevant problem in real-time systems. Many real-time embedded systems, as wearable devices or mobile robots that require batteries, seek to find techniques that reduce the energy consumption and, as a consequence, increase the lifetime of their batteries. Also clear operational, financial, monetary and environmental gains are reached when minimizing energy consumption. Faced with all this, this work addresses the problem of schedulability and contributes to the study of new scheduling techniques in partitioned real-time systems. These techniques provide the minimum time to feasible schedule tasks sets. Moreover, allocation techniques for multicore systems whose main objective is to reduce the energy consumption of the overall system are also proposed. Finally, some of the obtained results are discussed as conclusions and future works are introduced.Guasque Ortega, A. (2019). Study, analysis and new scheduling proposals in partitioned real-time systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/135279TESI

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    A Survey of Research into Mixed Criticality Systems

    Get PDF
    This survey covers research into mixed criticality systems that has been published since Vestal’s seminal paper in 2007, up until the end of 2016. The survey is organised along the lines of the major research areas within this topic. These include single processor analysis (including fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, realistic models, and systems issues. The survey also explores the relationship between research into mixed criticality systems and other topics such as hard and soft time constraints, fault tolerant scheduling, hierarchical scheduling, cyber physical systems, probabilistic real-time systems, and industrial safety standards

    Zuverlässige und Energieeffiziente gemischt-kritische Echtzeit On-Chip Systeme

    Get PDF
    Multi- and many-core embedded systems are increasingly becoming the target for many applications that require high performance under varying conditions. A resulting challenge is the control, and reliable operation of such complex multiprocessing architectures under changes, e.g., high temperature and degradation. In mixed-criticality systems where many applications with varying criticalities are consolidated on the same execution platform, fundamental isolation requirements to guarantee non-interference of critical functions are crucially important. While Networks-on-Chip (NoCs) are the prevalent solution to provide scalable and efficient interconnects for the multiprocessing architectures, their associated energy consumption has immensely increased. Specifically, hard real-time NoCs must manifest limited energy consumption as thermal runaway in such a core shared resource jeopardizes the whole system guarantees. Thus, dynamic energy management of NoCs, as opposed to the related work static solutions, is highly necessary to save energy and decrease temperature, while preserving essential temporal requirements. In this thesis, we introduce a centralized management to provide energy-aware NoCs for hard real-time systems. The design relies on an energy control network, developed on top of an existing switch arbitration network to allow isolation between energy optimization and data transmission. The energy control layer includes local units called Power-Aware NoC controllers that dynamically optimize NoC energy depending on the global state and applications’ temporal requirements. Furthermore, to adapt to abnormal situations that might occur in the system due to degradation, we extend the concept of NoC energy control to include the entire system scope. That is, online resource management employing hierarchical control layers to treat system degradation (imminent core failures) is supported. The mechanism applies system reconfiguration that involves workload migration. For mixed-criticality systems, it allows flexible boundaries between safety-critical and non-critical subsystems to safely apply the reconfiguration, preserving fundamental safety requirements and temporal predictability. Simulation and formal analysis-based experiments on various realistic usecases and benchmarks are conducted showing significant improvements in NoC energy-savings and in treatment of system degradation for mixed-criticality systems improving dependability over the status quo.Eingebettete Many- und Multi-core-Systeme werden zunehmend das Ziel für Anwendungen, die hohe Anfordungen unter unterschiedlichen Bedinungen haben. Für solche hochkomplexed Multi-Prozessor-Systeme ist es eine grosse Herausforderung zuverlässigen Betrieb sicherzustellen, insbesondere wenn sich die Umgebungseinflüsse verändern. In Systeme mit gemischter Kritikalität, in denen viele Anwendungen mit unterschiedlicher Kritikalität auf derselben Ausführungsplattform bedient werden müssen, sind grundlegende Isolationsanforderungen zur Gewährleistung der Nichteinmischung kritischer Funktionen von entscheidender Bedeutung. Während On-Chip Netzwerke (NoCs) häufig als skalierbare Verbindung für die Multiprozessor-Architekturen eingesetzt werden, ist der damit verbundene Energieverbrauch immens gestiegen. Daher sind dynamische Plattformverwaltungen, im Gegensatz zu den statischen, zwingend notwendig, um ein System an die oben genannten Veränderungen anzupassen und gleichzeitig Timing zu gewährleisten. In dieser Arbeit entwickeln wir energieeffiziente NoCs für harte Echtzeitsysteme. Das Design basiert auf einem Energiekontrollnetzwerk, das auf einem bestehenden Switch-Arbitration-Netzwerk entwickelt wurde, um eine Isolierung zwischen Energieoptimierung und Datenübertragung zu ermöglichen. Die Energiesteuerungsschicht umfasst lokale Einheiten, die als Power-Aware NoC-Controllers bezeichnet werden und die die NoC-Energie in Abhängigkeit vom globalen Zustand und den zeitlichen Anforderungen der Anwendungen optimieren. Darüber hinaus wird das Konzept der NoC-Energiekontrolle zur Anpassung an Anomalien, die aufgrund von Abnutzung auftreten können, auf den gesamten Systemumfang ausgedehnt. Online- Ressourcenverwaltungen, die hierarchische Kontrollschichten zur Behandlung Abnutzung (drohender Kernausfälle) einsetzen, werden bereitgestellt. Bei Systemen mit gemischter Kritikalität erlaubt es flexible Grenzen zwischen sicherheitskritischen und unkritischen Subsystemen, um die Rekonfiguration sicher anzuwenden, wobei grundlegende Sicherheitsanforderungen erhalten bleiben und Timing Vorhersehbarkeit. Experimente werden auf der Basis von Simulationen und formalen Analysen zu verschiedenen realistischen Anwendungsfallen und Benchmarks durchgeführt, die signifikanten Verbesserungen bei On-Chip Netzwerke-Energieeinsparungen und bei der Behandlung von Abnutzung für Systeme mit gemischter Kritikalität zur Verbesserung die Systemstabilität gegenüber dem bisherigen Status quo zeigen

    Scheduling Task-parallel Applications in Dynamically Asymmetric Environments

    Full text link
    Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we study how application-level scheduling techniques can leverage moldability (i.e. flexibility to work as either single-threaded or multithreaded task) and explicit knowledge on task criticality to handle scenarios in which system performance is not only unknown but also changing over time. Our proposed task scheduler dynamically learns the performance characteristics of the underlying platform and uses this knowledge to devise better schedules aware of dynamic performance asymmetry, hence reducing the impact of interference. Our evaluation shows that both criticality-aware scheduling and parallelism tuning are effective schemes to address interference in both shared and distributed memory applicationsComment: Published in ICPP Workshops '2

    Learning-based run-time power and energy management of multi/many-core systems: current and future trends

    Get PDF
    Multi/Many-core systems are prevalent in several application domains targeting different scales of computing such as embedded and cloud computing. These systems are able to fulfil the everincreasing performance requirements by exploiting their parallel processing capabilities. However, effective power/energy management is required during system operations due to several reasons such as to increase the operational time of battery operated systems, reduce the energy cost of datacenters, and improve thermal efficiency and reliability. This article provides an extensive survey of learning-based run-time power/energy management approaches. The survey includes a taxonomy of the learning-based approaches. These approaches perform design-time and/or run-time power/energy management by employing some learning principles such as reinforcement learning. The survey also highlights the trends followed by the learning-based run-time power management approaches, their upcoming trends and open research challenges
    corecore