738 research outputs found

    New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs

    Full text link
    Tesis por compendio[EN] Relevance of electronics towards safety of common devices has only been growing, as an ever growing stake of the functionality is assigned to them. But of course, this comes along the constant need for higher performances to fulfill such functionality requirements, while keeping power and budget low. In this scenario, industry is struggling to provide a technology which meets all the performance, power and price specifications, at the cost of an increased vulnerability to several types of known faults or the appearance of new ones. To provide a solution for the new and growing faults in the systems, designers have been using traditional techniques from safety-critical applications, which offer in general suboptimal results. In fact, modern embedded architectures offer the possibility of optimizing the dependability properties by enabling the interaction of hardware, firmware and software levels in the process. However, that point is not yet successfully achieved. Advances in every level towards that direction are much needed if flexible, robust, resilient and cost effective fault tolerance is desired. The work presented here focuses on the hardware level, with the background consideration of a potential integration into a holistic approach. The efforts in this thesis have focused several issues: (i) to introduce additional fault models as required for adequate representativity of physical effects blooming in modern manufacturing technologies, (ii) to provide tools and methods to efficiently inject both the proposed models and classical ones, (iii) to analyze the optimum method for assessing the robustness of the systems by using extensive fault injection and later correlation with higher level layers in an effort to cut development time and cost, (iv) to provide new detection methodologies to cope with challenges modeled by proposed fault models, (v) to propose mitigation strategies focused towards tackling such new threat scenarios and (vi) to devise an automated methodology for the deployment of many fault tolerance mechanisms in a systematic robust way. The outcomes of the thesis constitute a suite of tools and methods to help the designer of critical systems in his task to develop robust, validated, and on-time designs tailored to his application.[ES] La relevancia que la electrónica adquiere en la seguridad de los productos ha crecido inexorablemente, puesto que cada vez ésta copa una mayor influencia en la funcionalidad de los mismos. Pero, por supuesto, este hecho viene acompañado de una necesidad constante de mayores prestaciones para cumplir con los requerimientos funcionales, al tiempo que se mantienen los costes y el consumo en unos niveles reducidos. En este escenario, la industria está realizando esfuerzos para proveer una tecnología que cumpla con todas las especificaciones de potencia, consumo y precio, a costa de un incremento en la vulnerabilidad a múltiples tipos de fallos conocidos o la introducción de nuevos. Para ofrecer una solución a los fallos nuevos y crecientes en los sistemas, los diseñadores han recurrido a técnicas tradicionalmente asociadas a sistemas críticos para la seguridad, que ofrecen en general resultados sub-óptimos. De hecho, las arquitecturas empotradas modernas ofrecen la posibilidad de optimizar las propiedades de confiabilidad al habilitar la interacción de los niveles de hardware, firmware y software en el proceso. No obstante, ese punto no está resulto todavía. Se necesitan avances en todos los niveles en la mencionada dirección para poder alcanzar los objetivos de una tolerancia a fallos flexible, robusta, resiliente y a bajo coste. El trabajo presentado aquí se centra en el nivel de hardware, con la consideración de fondo de una potencial integración en una estrategia holística. Los esfuerzos de esta tesis se han centrado en los siguientes aspectos: (i) la introducción de modelos de fallo adicionales requeridos para la representación adecuada de efectos físicos surgentes en las tecnologías de manufactura actuales, (ii) la provisión de herramientas y métodos para la inyección eficiente de los modelos propuestos y de los clásicos, (iii) el análisis del método óptimo para estudiar la robustez de sistemas mediante el uso de inyección de fallos extensiva, y la posterior correlación con capas de más alto nivel en un esfuerzo por recortar el tiempo y coste de desarrollo, (iv) la provisión de nuevos métodos de detección para cubrir los retos planteados por los modelos de fallo propuestos, (v) la propuesta de estrategias de mitigación enfocadas hacia el tratamiento de dichos escenarios de amenaza y (vi) la introducción de una metodología automatizada de despliegue de diversos mecanismos de tolerancia a fallos de forma robusta y sistemática. Los resultados de la presente tesis constituyen un conjunto de herramientas y métodos para ayudar al diseñador de sistemas críticos en su tarea de desarrollo de diseños robustos, validados y en tiempo adaptados a su aplicación.[CA] La rellevància que l'electrònica adquireix en la seguretat dels productes ha crescut inexorablement, puix cada volta més aquesta abasta una major influència en la funcionalitat dels mateixos. Però, per descomptat, aquest fet ve acompanyat d'un constant necessitat de majors prestacions per acomplir els requeriments funcionals, mentre es mantenen els costos i consums en uns nivells reduïts. Donat aquest escenari, la indústria està fent esforços per proveir una tecnologia que complisca amb totes les especificacions de potència, consum i preu, tot a costa d'un increment en la vulnerabilitat a diversos tipus de fallades conegudes, i a la introducció de nous tipus. Per oferir una solució a les noves i creixents fallades als sistemes, els dissenyadors han recorregut a tècniques tradicionalment associades a sistemes crítics per a la seguretat, que en general oferixen resultats sub-òptims. De fet, les arquitectures empotrades modernes oferixen la possibilitat d'optimitzar les propietats de confiabilitat en habilitar la interacció dels nivells de hardware, firmware i software en el procés. Tot i això eixe punt no està resolt encara. Es necessiten avanços a tots els nivells en l'esmentada direcció per poder assolir els objectius d'una tolerància a fallades flexible, robusta, resilient i a baix cost. El treball ací presentat se centra en el nivell de hardware, amb la consideració de fons d'una potencial integració en una estratègia holística. Els esforços d'esta tesi s'han centrat en els següents aspectes: (i) la introducció de models de fallada addicionals requerits per a la representació adequada d'efectes físics que apareixen en les tecnologies de fabricació actuals, (ii) la provisió de ferramentes i mètodes per a la injecció eficient del models proposats i dels clàssics, (iii) l'anàlisi del mètode òptim per estudiar la robustesa de sistemes mitjançant l'ús d'injecció de fallades extensiva, i la posterior correlació amb capes de més alt nivell en un esforç per retallar el temps i cost de desenvolupament, (iv) la provisió de nous mètodes de detecció per cobrir els reptes plantejats pels models de fallades proposats, (v) la proposta d'estratègies de mitigació enfocades cap al tractament dels esmentats escenaris d'amenaça i (vi) la introducció d'una metodologia automatitzada de desplegament de diversos mecanismes de tolerància a fallades de forma robusta i sistemàtica. Els resultats de la present tesi constitueixen un conjunt de ferramentes i mètodes per ajudar el dissenyador de sistemes crítics en la seua tasca de desenvolupament de dissenys robustos, validats i a temps adaptats a la seua aplicació.Espinosa García, J. (2016). New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73146TESISCompendi

    Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

    Full text link
    Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Study of fault-tolerant software technology

    Get PDF
    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

    Real-time Adaptive Sensor Attack Detection and Recovery in Autonomous Cyber-physical Systems

    Get PDF
    Cyber-Physical Systems (CPS) tightly couple information technology with physical processes, which rises new vulnerabilities such as physical attacks that are beyond conventional cyber attacks.Attackers may non-invasively compromise sensors and spoof the controller to perform unsafe actions. This issue is even emphasized with the increasing autonomy in CPS. While this fact has motivated many defense mechanisms against sensor attacks, a clear vision of the timing and usability (or the false alarm rate) of attack detection still remains elusive. Existing works tend to pursue an unachievable goal of minimizing the detection delay and false alarm rate at the same time, while there is a clear trade-off between the two metrics. Instead, this dissertation argues that attack detection should bias different metrics (detection delay and false alarm) when a system sits in different states. For example, if the system is close to unsafe states, reducing the detection delay is preferable to lowering the false alarm rate, and vice versa. This dissertation proposes two real-time adaptive sensor attack detection frameworks. The frameworks can dynamically adapt the detection delay and false alarm rate so as to meet a detection deadline and improve usability according to different system statuses. We design and implement the proposed frameworks and validate them using realistic sensor data of automotive CPS to demonstrate its efficiency and efficacy. Further, this dissertation proposes \textit{Recovery-by-Learning}, a data-driven attack recovery framework that restores CPS from sensor attacks. The importance of attack recovery is emphasized by the need to mitigate the attack\u27s impact on a system and restore it to continue functioning. We propose a double sliding window-based checkpointing protocol to remove compromised data and keep trustful data for state estimation. Together, the proposed solutions enable a holistic attack resilient solution for automotive cyber-physical systems

    On Age-of-Information Aware Resource Allocation for Industrial Control-Communication-Codesign

    Get PDF
    Unter dem Überbegriff Industrie 4.0 wird in der industriellen Fertigung die zunehmende Digitalisierung und Vernetzung von industriellen Maschinen und Prozessen zusammengefasst. Die drahtlose, hoch-zuverlässige, niedrig-latente Kommunikation (engl. ultra-reliable low-latency communication, URLLC) – als Bestandteil von 5G gewährleistet höchste Dienstgüten, die mit industriellen drahtgebundenen Technologien vergleichbar sind und wird deshalb als Wegbereiter von Industrie 4.0 gesehen. Entgegen diesem Trend haben eine Reihe von Arbeiten im Forschungsbereich der vernetzten Regelungssysteme (engl. networked control systems, NCS) gezeigt, dass die hohen Dienstgüten von URLLC nicht notwendigerweise erforderlich sind, um eine hohe Regelgüte zu erzielen. Das Co-Design von Kommunikation und Regelung ermöglicht eine gemeinsame Optimierung von Regelgüte und Netzwerkparametern durch die Aufweichung der Grenze zwischen Netzwerk- und Applikationsschicht. Durch diese Verschränkung wird jedoch eine fundamentale (gemeinsame) Neuentwicklung von Regelungssystemen und Kommunikationsnetzen nötig, was ein Hindernis für die Verbreitung dieses Ansatzes darstellt. Stattdessen bedient sich diese Dissertation einem Co-Design-Ansatz, der beide Domänen weiterhin eindeutig voneinander abgrenzt, aber das Informationsalter (engl. age of information, AoI) als bedeutenden Schnittstellenparameter ausnutzt. Diese Dissertation trägt dazu bei, die Echtzeitanwendungszuverlässigkeit als Folge der Überschreitung eines vorgegebenen Informationsalterschwellenwerts zu quantifizieren und fokussiert sich dabei auf den Paketverlust als Ursache. Anhand der Beispielanwendung eines fahrerlosen Transportsystems wird gezeigt, dass die zeitlich negative Korrelation von Paketfehlern, die in heutigen Systemen keine Rolle spielt, für Echtzeitanwendungen äußerst vorteilhaft ist. Mit der Annahme von schnellem Schwund als dominanter Fehlerursache auf der Luftschnittstelle werden durch zeitdiskrete Markovmodelle, die für die zwei Netzwerkarchitekturen Single-Hop und Dual-Hop präsentiert werden, Kommunikationsfehlerfolgen auf einen Applikationsfehler abgebildet. Diese Modellierung ermöglicht die analytische Ableitung von anwendungsbezogenen Zuverlässigkeitsmetriken wie die durschnittliche Dauer bis zu einem Fehler (engl. mean time to failure). Für Single-Hop-Netze wird das neuartige Ressourcenallokationsschema State-Aware Resource Allocation (SARA) entwickelt, das auf dem Informationsalter beruht und die Anwendungszuverlässigkeit im Vergleich zu statischer Multi-Konnektivität um Größenordnungen erhöht, während der Ressourcenverbrauch im Bereich von konventioneller Einzelkonnektivität bleibt. Diese Zuverlässigkeit kann auch innerhalb eines Systems von Regelanwendungen, in welchem mehrere Agenten um eine begrenzte Anzahl Ressourcen konkurrieren, statistisch garantiert werden, wenn die Anzahl der verfügbaren Ressourcen pro Agent um ca. 10 % erhöht werden. Für das Dual-Hop Szenario wird darüberhinaus ein Optimierungsverfahren vorgestellt, das eine benutzerdefinierte Kostenfunktion minimiert, die niedrige Anwendungszuverlässigkeit, hohes Informationsalter und hohen durchschnittlichen Ressourcenverbrauch bestraft und so das benutzerdefinierte optimale SARA-Schema ableitet. Diese Optimierung kann offline durchgeführt und als Look-Up-Table in der unteren Medienzugriffsschicht zukünftiger industrieller Drahtlosnetze implementiert werden.:1. Introduction 1 1.1. The Need for an Industrial Solution . . . . . . . . . . . . . . . . . . . 3 1.2. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Related Work 7 2.1. Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Codesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1. The Need for Abstraction – Age of Information . . . . . . . . 11 2.4. Dependability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3. Deriving Proper Communications Requirements 17 3.1. Fundamentals of Control Theory . . . . . . . . . . . . . . . . . . . . 18 3.1.1. Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2. Performance Requirements . . . . . . . . . . . . . . . . . . . 21 3.1.3. Packet Losses and Delay . . . . . . . . . . . . . . . . . . . . . 22 3.2. Joint Design of Control Loop with Packet Losses . . . . . . . . . . . . 23 3.2.1. Method 1: Reduced Sampling . . . . . . . . . . . . . . . . . . 23 3.2.2. Method 2: Markov Jump Linear System . . . . . . . . . . . . . 25 3.2.3. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3. Focus Application: The AGV Use Case . . . . . . . . . . . . . . . . . . 31 3.3.1. Control Loop Model . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.2. Control Performance Requirements . . . . . . . . . . . . . . . 33 3.3.3. Joint Modeling: Applying Reduced Sampling . . . . . . . . . . 34 3.3.4. Joint Modeling: Applying MJLS . . . . . . . . . . . . . . . . . 34 3.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4. Modeling Control-Communication Failures 43 4.1. Communication Assumptions . . . . . . . . . . . . . . . . . . . . . . 43 4.1.1. Small-Scale Fading as a Cause of Failure . . . . . . . . . . . . 44 4.1.2. Connectivity Models . . . . . . . . . . . . . . . . . . . . . . . 46 4.2. Failure Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1. Single-hop network . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.2. Dual-hop network . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3. Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.1. Mean Time to Failure . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.2. Packet Loss Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.3. Average Number of Assigned Channels . . . . . . . . . . . . . 57 4.3.4. Age of Information . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5. Single Hop – Single Agent 61 5.1. State-Aware Resource Allocation . . . . . . . . . . . . . . . . . . . . 61 5.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.3. Erroneous Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 67 5.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6. Single Hop – Multiple Agents 71 6.1. Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.1.1. Admission Control . . . . . . . . . . . . . . . . . . . . . . . . 72 6.1.2. Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . 73 6.1.3. Computational Complexity . . . . . . . . . . . . . . . . . . . 74 6.1.4. Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 75 6.2. Illustration Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.3.1. Verification through System-Level Simulation . . . . . . . . . 78 6.3.2. Applicability on the System Level . . . . . . . . . . . . . . . . 79 6.3.3. Comparison of Admission Control Schemes . . . . . . . . . . 80 6.3.4. Impact of the Packet Loss Tolerance . . . . . . . . . . . . . . . 82 6.3.5. Impact of the Number of Agents . . . . . . . . . . . . . . . . . 84 6.3.6. Age of Information . . . . . . . . . . . . . . . . . . . . . . . . 84 6.3.7. Channel Saturation Ratio . . . . . . . . . . . . . . . . . . . . 86 6.3.8. Enforcing Full Channel Saturation . . . . . . . . . . . . . . . 86 6.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7. Dual Hop – Single Agent 91 7.1. State-Aware Resource Allocation . . . . . . . . . . . . . . . . . . . . 91 7.2. Optimization Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.3.1. Extensive Simulation . . . . . . . . . . . . . . . . . . . . . . . 96 7.3.2. Non-Integer-Constrained Optimization . . . . . . . . . . . . . 98 7.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8. Conclusions and Outlook 105 8.1. Key Results and Conclusions . . . . . . . . . . . . . . . . . . . . . . . 105 8.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 A. DC Motor Model 111 Bibliography 113 Publications of the Author 127 List of Figures 129 List of Tables 131 List of Operators and Constants 133 List of Symbols 135 List of Acronyms 137 Curriculum Vitae 139In industrial manufacturing, Industry 4.0 refers to the ongoing convergence of the real and virtual worlds, enabled through intelligently interconnecting industrial machines and processes through information and communications technology. Ultrareliable low-latency communication (URLLC) is widely regarded as the enabling technology for Industry 4.0 due to its ability to fulfill highest quality-of-service (QoS) comparable to those of industrial wireline connections. In contrast to this trend, a range of works in the research domain of networked control systems have shown that URLLC’s supreme QoS is not necessarily required to achieve high quality-ofcontrol; the co-design of control and communication enables to jointly optimize and balance both quality-of-control parameters and network parameters through blurring the boundary between application and network layer. However, through the tight interlacing, this approach requires a fundamental (joint) redesign of both control systems and communication networks and may therefore not lead to short-term widespread adoption. Therefore, this thesis instead embraces a novel co-design approach which keeps both domains distinct but leverages the combination of control and communications by yet exploiting the age of information (AoI) as a valuable interface metric. This thesis contributes to quantifying application dependability as a consequence of exceeding a given peak AoI with the particular focus on packet losses. The beneficial influence of negative temporal packet loss correlation on control performance is demonstrated by means of the automated guided vehicle use case. Assuming small-scale fading as the dominant cause of communication failure, a series of communication failures are mapped to an application failure through discrete-time Markov models for single-hop (e.g, only uplink or downlink) and dual-hop (e.g., subsequent uplink and downlink) architectures. This enables the derivation of application-related dependability metrics such as the mean time to failure in closed form. For single-hop networks, an AoI-aware resource allocation strategy termed state-aware resource allocation (SARA) is proposed that increases the application reliability by orders of magnitude compared to static multi-connectivity while keeping the resource consumption in the range of best-effort single-connectivity. This dependability can also be statistically guaranteed on a system level – where multiple agents compete for a limited number of resources – if the provided amount of resources per agent is increased by approximately 10 %. For the dual-hop scenario, an AoI-aware resource allocation optimization is developed that minimizes a user-defined penalty function that punishes low application reliability, high AoI, and high average resource consumption. This optimization may be carried out offline and each resulting optimal SARA scheme may be implemented as a look-up table in the lower medium access control layer of future wireless industrial networks.:1. Introduction 1 1.1. The Need for an Industrial Solution . . . . . . . . . . . . . . . . . . . 3 1.2. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Related Work 7 2.1. Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Codesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1. The Need for Abstraction – Age of Information . . . . . . . . 11 2.4. Dependability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3. Deriving Proper Communications Requirements 17 3.1. Fundamentals of Control Theory . . . . . . . . . . . . . . . . . . . . 18 3.1.1. Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2. Performance Requirements . . . . . . . . . . . . . . . . . . . 21 3.1.3. Packet Losses and Delay . . . . . . . . . . . . . . . . . . . . . 22 3.2. Joint Design of Control Loop with Packet Losses . . . . . . . . . . . . 23 3.2.1. Method 1: Reduced Sampling . . . . . . . . . . . . . . . . . . 23 3.2.2. Method 2: Markov Jump Linear System . . . . . . . . . . . . . 25 3.2.3. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3. Focus Application: The AGV Use Case . . . . . . . . . . . . . . . . . . 31 3.3.1. Control Loop Model . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.2. Control Performance Requirements . . . . . . . . . . . . . . . 33 3.3.3. Joint Modeling: Applying Reduced Sampling . . . . . . . . . . 34 3.3.4. Joint Modeling: Applying MJLS . . . . . . . . . . . . . . . . . 34 3.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4. Modeling Control-Communication Failures 43 4.1. Communication Assumptions . . . . . . . . . . . . . . . . . . . . . . 43 4.1.1. Small-Scale Fading as a Cause of Failure . . . . . . . . . . . . 44 4.1.2. Connectivity Models . . . . . . . . . . . . . . . . . . . . . . . 46 4.2. Failure Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1. Single-hop network . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.2. Dual-hop network . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3. Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.1. Mean Time to Failure . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.2. Packet Loss Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.3. Average Number of Assigned Channels . . . . . . . . . . . . . 57 4.3.4. Age of Information . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5. Single Hop – Single Agent 61 5.1. State-Aware Resource Allocation . . . . . . . . . . . . . . . . . . . . 61 5.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.3. Erroneous Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 67 5.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6. Single Hop – Multiple Agents 71 6.1. Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.1.1. Admission Control . . . . . . . . . . . . . . . . . . . . . . . . 72 6.1.2. Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . 73 6.1.3. Computational Complexity . . . . . . . . . . . . . . . . . . . 74 6.1.4. Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 75 6.2. Illustration Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.3.1. Verification through System-Level Simulation . . . . . . . . . 78 6.3.2. Applicability on the System Level . . . . . . . . . . . . . . . . 79 6.3.3. Comparison of Admission Control Schemes . . . . . . . . . . 80 6.3.4. Impact of the Packet Loss Tolerance . . . . . . . . . . . . . . . 82 6.3.5. Impact of the Number of Agents . . . . . . . . . . . . . . . . . 84 6.3.6. Age of Information . . . . . . . . . . . . . . . . . . . . . . . . 84 6.3.7. Channel Saturation Ratio . . . . . . . . . . . . . . . . . . . . 86 6.3.8. Enforcing Full Channel Saturation . . . . . . . . . . . . . . . 86 6.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7. Dual Hop – Single Agent 91 7.1. State-Aware Resource Allocation . . . . . . . . . . . . . . . . . . . . 91 7.2. Optimization Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.3.1. Extensive Simulation . . . . . . . . . . . . . . . . . . . . . . . 96 7.3.2. Non-Integer-Constrained Optimization . . . . . . . . . . . . . 98 7.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8. Conclusions and Outlook 105 8.1. Key Results and Conclusions . . . . . . . . . . . . . . . . . . . . . . . 105 8.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 A. DC Motor Model 111 Bibliography 113 Publications of the Author 127 List of Figures 129 List of Tables 131 List of Operators and Constants 133 List of Symbols 135 List of Acronyms 137 Curriculum Vitae 13

    Fault-tolerant software: dependability/performance trade-offs, concurrency and system support

    Get PDF
    PhD ThesisAs the use of computer systems becomes more and more widespread in applications that demand high levels of dependability, these applications themselves are growing in complexity in a rapid rate, especially in the areas that require concurrent and distributed computing. Such complex systems are very prone to faults and errors. No matter how rigorously fault avoidance and fault removal techniques are applied, software design faults often remain in systems when they are delivered to the customers. In fact, residual software faults are becoming the significant underlying cause of system failures and the lack of dependability. There is tremendous need for systematic techniques for building dependable software, including the fault tolerance techniques that ensure software-based systems to operate dependably even when potential faults are present. However, although there has been a large amount of research in the area of fault-tolerant software, existing techniques are not yet sufficiently mature as a practical engineering discipline for realistic applications. In particular, they are often inadequate when applied to highly concurrent and distributed software. This thesis develops new techniques for building fault-tolerant software, addresses the problem of achieving high levels of dependability in concurrent and distributed object systems, and studies system-level support for implementing dependable software. Two schemes are developed - the t/(n-l)-VP approach is aimed at increasing software reliability and controlling additional complexity, while the SCOP approach presents an adaptive way of dynamically adjusting software reliability and efficiency aspects. As a more general framework for constructing dependable concurrent and distributed software, the Coordinated Atomic (CA) Action scheme is examined thoroughly. Key properties of CA actions are formalized, conceptual model and mechanisms for handling application level exceptions are devised, and object-based diversity techniques are introduced to cope with potential software faults. These three schemes are evaluated analytically and validated by controlled experiments. System-level support is also addressed with a multi-level system architecture. An architectural pattern for implementing fault-tolerant objects is documented in detail to capture existing solutions and our previous experience. An industrial safety-critical application, the Fault-Tolerant Production Cell, is used as a case study to examine most of the concepts and techniques developed in this research.ESPRIT

    Regional-scale controls on rockfall occurrence

    Get PDF
    Rockfalls exert a first-order control on the rate of rock wall retreat on mountain slopes and on coastal rock cliffs. Their occurrence is conditioned by a combination of intrinsic (resisting) and extrinsic (driving) processes, yet determining the exact effects of these processes on rockfall activity and the resulting cliff erosion remains difficult. Although rockfall activity has been monitored extensively in a variety of settings, high-resolution observations of rockfall occurrence on a regional scale are scarce. This is partly owing to difficulties in adequately quantifying the full range of possible rockfall volumes with sufficient accuracy and completeness, and at a scale that exceeds the influence of localised controls on rockfalls. This lack of insight restricts our ability to abstract patterns, to identify long-term changes in behaviour, and to assess how rock slopes respond to changes in both structural and environmental conditions, without resorting to a space for-time substitution. This thesis develops a workflow, from novel data collection to analysis, which is tailored to monitoring rockfall activity and the resulting cliff retreat continuously (in space), in 3D, and over large spatial scales (>104m)(> 10^4 m). The approach is tested by analysing rockfall activity and the resulting erosion recorded along 20.5 km of near-vertical coastal cliffs, in what is considered as the first multi-temporal detection of rockfalls at a regional-scale and in full 3D. The resulting data are then used to derive a quantitative appraisal of along-coast variations in the geometric properties of exposed discontinuity surfaces, to assess the extent to which these drive patterns in the size and shape of the rockfalls observed. High-resolution field monitoring is then undertaken along a subsection of the coastline (>102m)(> 10^2 m), where cliff lithology and structure are approximately uniform, in order to quantify spatial variations in wave loading characteristics and to relate these to local morphological conditions, which can act as a proxy for wave loading characteristics. The resulting rockfall inventory is analysed to identify the characteristics of rock slope change that only become apparent when assessed at this scale, placing bounds on data previously collected more locally (<102m)(< 10^2 m). The data show that spatial consistencies in the distribution of rockfall shape and volume through time approximately follow the geological setting of the coastline, but that variations in the strength of these consistencies are likely to be conditioned by differences in local processes and morphological controls between sites. These results are used to examine the relationships between key metrics of erosion, structural, and morphological controls, which ultimately permits the identification of areas where patterns of erosion are dominated by either intrinsic or extrinsic processes, or a mixture of both. Uniquely, the methodologies and data presented here mark a step-change in our ability to understand the competing effects of different processes in determining the magnitude and frequency of rockfall activity, and the resulting cliff erosion. The findings of this research hold considerable implications for our understanding of rockfalls, and for monitoring, modelling, and managing actively failing rock slopes

    A dependability framework for WSN-based aquatic monitoring systems

    Get PDF
    Wireless Sensor Networks (WSN) are being progressively used in several application areas, particularly to collect data and monitor physical processes. Moreover, sensor nodes used in environmental monitoring applications, such as the aquatic sensor networks, are often subject to harsh environmental conditions while monitoring complex phenomena. Non-functional requirements, like reliability, security or availability, are increasingly important and must be accounted for in the application development. For that purpose, there is a large body of knowledge on dependability techniques for distributed systems, which provides a good basis to understand how to satisfy these non-functional requirements of WSN-based monitoring applications. Given the data-centric nature of monitoring applications, it is of particular importance to ensure that data is reliable or, more generically, that it has the necessary quality. The problem of ensuring the desired quality of data for dependable monitoring using WSNs is studied herein. With a dependability-oriented perspective, it is reviewed the possible impairments to dependability and the prominent existing solutions to solve or mitigate these impairments. Despite the variety of components that may form a WSN-based monitoring system, it is given particular attention to understanding which faults can affect sensors, how they can affect the quality of the information, and how this quality can be improved and quantified. Open research issues for the specific case of aquatic monitoring applications are also discussed. One of the challenges in achieving a dependable system behavior is to overcome the external disturbances affecting sensor measurements and detect the failure patterns in sensor data. This is a particular problem in environmental monitoring, due to the difficulty in distinguishing a faulty behavior from the representation of a natural phenomenon. Existing solutions for failure detection assume that physical processes can be accurately modeled, or that there are large deviations that may be detected using coarse techniques, or more commonly that it is a high-density sensor network with value redundant sensors. This thesis aims at defining a new methodology for dependable data quality in environmental monitoring systems, aiming to detect faulty measurements and increase the sensors data quality. The framework of the methodology is overviewed through a generically applicable design, which can be employed to any environment sensor network dataset. The methodology is evaluated in various datasets of different WSNs, where it is used machine learning to model each sensor behavior, exploiting the existence of correlated data provided by neighbor sensors. It is intended to explore the data fusion strategies in order to effectively detect potential failures for each sensor and, simultaneously, distinguish truly abnormal measurements from deviations due to natural phenomena. This is accomplished with the successful application of the methodology to detect and correct outliers, offset and drifting failures in real monitoring networks datasets. In the future, the methodology can be applied to optimize the data quality control processes of new and already operating monitoring networks, and assist in the networks maintenance operations.As redes de sensores sem fios (RSSF) têm vindo cada vez mais a serem utilizadas em diversas áreas de aplicação, em especial para monitorizar e capturar informação de processos físicos em meios naturais. Neste contexto, os sensores que estão em contacto direto com o respectivo meio ambiente, como por exemplo os sensores em meios aquáticos, estão sujeitos a condições adversas e complexas durante o seu funcionamento. Esta complexidade conduz à necessidade de considerarmos, durante o desenvolvimento destas redes, os requisitos não funcionais da confiabilidade, da segurança ou da disponibilidade elevada. Para percebermos como satisfazer estes requisitos da monitorização com base em RSSF para aplicações ambientais, já existe uma boa base de conhecimento sobre técnicas de confiabilidade em sistemas distribuídos. Devido ao foco na obtenção de dados deste tipo de aplicações de RSSF, é particularmente importante garantir que os dados obtidos na monitorização sejam confiáveis ou, de uma forma mais geral, que tenham a qualidade necessária para o objetivo pretendido. Esta tese estuda o problema de garantir a qualidade de dados necessária para uma monitorização confiável usando RSSF. Com o foco na confiabilidade, revemos os possíveis impedimentos à obtenção de dados confiáveis e as soluções existentes capazes de corrigir ou mitigar esses impedimentos. Apesar de existir uma grande variedade de componentes que formam ou podem formar um sistema de monitorização com base em RSSF, prestamos particular atenção à compreensão das possíveis faltas que podem afetar os sensores, a como estas faltas afetam a qualidade dos dados recolhidos pelos sensores e a como podemos melhorar os dados e quantificar a sua qualidade. Tendo em conta o caso específico dos sistemas de monitorização em meios aquáticos, discutimos ainda as várias linhas de investigação em aberto neste tópico. Um dos desafios para se atingir um sistema de monitorização confiável é a deteção da influência de fatores externos relacionados com o ambiente monitorizado, que afetam as medições obtidas pelos sensores, bem como a deteção de comportamentos de falha nas medições. Este desafio é um problema particular na monitorização em ambientes naturais adversos devido à dificuldade da distinção entre os comportamentos associados às falhas nos sensores e os comportamentos dos sensores afetados pela à influência de um evento natural. As soluções existentes para este problema, relacionadas com deteção de faltas, assumem que os processos físicos a monitorizar podem ser modelados de forma eficaz, ou que os comportamentos de falha são caraterizados por desvios elevados do comportamento expectável de forma a serem facilmente detetáveis. Mais frequentemente, as soluções assumem que as redes de sensores contêm um número suficientemente elevado de sensores na área monitorizada e, consequentemente, que existem sensores redundantes relativamente à medição. Esta tese tem como objetivo a definição de uma nova metodologia para a obtenção de qualidade de dados confiável em sistemas de monitorização ambientais, com o intuito de detetar a presença de faltas nas medições e aumentar a qualidade dos dados dos sensores. Esta metodologia tem uma estrutura genérica de forma a ser aplicada a uma qualquer rede de sensores ambiental ou ao respectivo conjunto de dados obtido pelos sensores desta. A metodologia é avaliada através de vários conjuntos de dados de diferentes RSSF, em que aplicámos técnicas de aprendizagem automática para modelar o comportamento de cada sensor, com base na exploração das correlações existentes entre os dados obtidos pelos sensores da rede. O objetivo é a aplicação de estratégias de fusão de dados para a deteção de potenciais falhas em cada sensor e, simultaneamente, a distinção de medições verdadeiramente defeituosas de desvios derivados de eventos naturais. Este objectivo é cumprido através da aplicação bem sucedida da metodologia para detetar e corrigir outliers, offsets e drifts em conjuntos de dados reais obtidos por redes de sensores. No futuro, a metodologia pode ser aplicada para otimizar os processos de controlo da qualidade de dados quer de novos sistemas de monitorização, quer de redes de sensores já em funcionamento, bem como para auxiliar operações de manutenção das redes.Laboratório Nacional de Engenharia Civi
    corecore