3,457 research outputs found

    An approach to rollback recovery of collaborating mobile agents

    Get PDF
    Fault-tolerance is one of the main problems that must be resolved to improve the adoption of the agents' computing paradigm. In this paper, we analyse the execution model of agent platforms and the significance of the faults affecting their constituent components on the reliable execution of agent-based applications, in order to develop a pragmatic framework for agent systems fault-tolerance. The developed framework deploys a communication-pairs independent check pointing strategy to offer a low-cost, application-transparent model for reliable agent- based computing that covers all possible faults that might invalidate reliable agent execution, migration and communication and maintains the exactly-one execution property

    On Secure Workflow Decentralisation on the Internet

    Get PDF
    Decentralised workflow management systems are a new research area, where most work to-date has focused on the system's overall architecture. As little attention has been given to the security aspects in such systems, we follow a security driven approach, and consider, from the perspective of available security building blocks, how security can be implemented and what new opportunities are presented when empowering the decentralised environment with modern distributed security protocols. Our research is motivated by a more general question of how to combine the positive enablers that email exchange enjoys, with the general benefits of workflow systems, and more specifically with the benefits that can be introduced in a decentralised environment. This aims to equip email users with a set of tools to manage the semantics of a message exchange, contents, participants and their roles in the exchange in an environment that provides inherent assurances of security and privacy. This work is based on a survey of contemporary distributed security protocols, and considers how these protocols could be used in implementing a distributed workflow management system with decentralised control . We review a set of these protocols, focusing on the required message sequences in reviewing the protocols, and discuss how these security protocols provide the foundations for implementing core control-flow, data, and resource patterns in a distributed workflow environment

    Fault-Tolerant Mobile Agent Execution

    Get PDF

    Survey On Fault Tolerance In Grid Computing

    Full text link

    Fault-tolerant and transactional mobile agent execution

    Get PDF
    Mobile agents constitute a computing paradigm of a more general nature than the widely used client/server computing paradigm. A mobile agent is essentially a computer program that acts autonomously on behalf of a user and travels through a network of heterogeneous machines. However, the greater flexibility of the mobile agent paradigm compared to the client/server computing paradigm comes at additional costs. These costs include, among others, the additional complexity of developing and managing mobile agent-based applications. This additional complexity comprises such issues as reliability. Before mobile agent technology can appear at the core of tomorrow's business applications, reliability mechanisms for mobile agents must be established. In this context, fault tolerance and transaction support are mechanisms of considerable importance. Various approaches to fault tolerance and transaction support exist. They have different strengths and weaknesses, and address different environments. Because of this variety, it is often difficult for the application programmer to choose the approach best suited to an application. This thesis introduces a classification of current approaches to fault-tolerant and transactional mobile agent execution. The classification, which focuses on algorithmic aspects, aims at structuring the field of fault-tolerant and transactional mobile agent execution and facilitates an understanding of the properties and weaknesses of particular approaches. In a distributed system, any software or hardware component may be subject to failures. A single failing component (e.g., agent or machine) may prevent the agent from proceeding with its execution. Worse yet, the current state of the agent and even its code may be lost. We say that the agent execution is blocked. For the agent owner, i.e., the person or application that has configured the agent, the agent does not return. To achieve fault-tolerance, the agent owner can try to detect the failure of the agent, and upon such an event launch a new agent. However, this requires the ability to correctly detect the crash of the agent, i.e., to distinguish between a failed agent and an agent that is delayed by slow processors or slow communication links. Unfortunately, this cannot be achieved in systems such as the Internet. An agent owner who tries to detect the failure of the agent thus cannot prevent the case in which the agent is mistakenly assumed to have crashed. In this case, launching a new agent leads to multiple executions of the agent, i.e., to the violation of the desired exactly-once property of agent execution. Although this may be acceptable for certain applications (e.g., applications whose operations do not have side-effects), others clearly forbid it. In this context, launching a new agent is a form of replication. In general, replication prevents blocking, but may lead to multiple executions of the agent, i.e., to a violation of the exactly-once execution property. This thesis presents an approach that ensures the exactly-once execution property using a simple principle: the mobile agent execution is modeled as a sequence of agreement problems. This model leads to an approach based on two well-known building blocks: consensus and reliable broadcast. We validate this approach with the implementation of FATOMAS, a Java-based FAult-TOlerant Mobile Agent System, and measure its overhead. Transactional mobile agents execute the mobile agent as a transaction. Assume, for instance, an agent whose task is to buy an airline ticket, book a hotel room, and rent a car at the flight destination. The agent owner naturally wants all three operations to succeed or none at all. Clearly, the rental car at the destination is of no use if no flight to the destination is available. On the other hand, the airline ticket may be useless if no rental car is available. The mobile agent's operations thus need to execute atomically, i.e., either all of them or none at all. Execution atomicity also needs to be ensured in the event of failures of hardware or software components. The approach presented in this thesis is non-blocking. A non-blocking transactional mobile agent execution has the important advantage that it can make progress despite failures. In a blocking transactional mobile agent execution, by contrast, progress is only possible when the failed component has recovered. Until then, the acquired locks generally cannot be freed. As no other transactional mobile agents can acquire the lock, overall system throughput is dramatically reduced. The present approach reuses the work on fault-tolerant mobile agent execution to prevent blocking. We have implemented the proposed approach and present the evaluation results

    Using mobility and exception handling to achieve mobile agents that survive server crash failures

    Get PDF
    Mobile agent technology, when designed and used effectively, can minimize bandwidth consumption and autonomously provide a snapshot of the current context of a distributed system. Protecting mobile agents from server crashes is a challenging issue, since developers normally have no control over remote servers. Server crash failures can leave replicas, instable storage, unavailable for an unknown time period. Furthermore, few systems have considered the need for using a fault tolerant protocol among a group of collaborating mobile agents. This thesis uses exception handling to protect mobile agents from server crash failures. An exception model is proposed for mobile agents and two exception handler designs are investigated. The first exists at the server that created the mobile agent and uses a timeout mechanism. The second, the mobile shadow scheme, migrates with the mobile agent and operates at the previous server visited by the mobile agent. A case study application has been developed to compare the performance of the two exception handler designs. Performance results demonstrate that although the second design is slower it offers the smaller trip time when handling a server crash. Furthermore, no modification of the server environment is necessary. This thesis shows that the mobile shadow exception handling scheme reduces complexity for a group of mobile agents to survive server crashes. The scheme deploys a replica that monitors the server occupied by the master, at each stage of the itinerary. The replica exists at the previous server visited in the itinerary. Consequently, each group member is a single fault tolerant entity with respect to server crash failures. Other schemes introduce greater complexity and performance overheads since, for each stage of the itinerary, a group of replicas is sent to servers that offer an equivalent service. In addition, future research is established for fault tolerance in groups of collaborating mobile agents

    Coordination and Self-Adaptive Communication Primitives for Low-Power Wireless Networks

    Get PDF
    The Internet of Things (IoT) is a recent trend where objects are augmented with computing and communication capabilities, often via low-power wireless radios. The Internet of Things is an enabler for a connected and more sustainable modern society: smart grids are deployed to improve energy production and consumption, wireless monitoring systems allow smart factories to detect faults early and reduce waste, while connected vehicles coordinate on the road to ensure our safety and save fuel. Many recent IoT applications have stringent requirements for their wireless communication substrate: devices must cooperate and coordinate, must perform efficiently under varying and sometimes extreme environments, while strict deadlines must be met. Current distributed coordination algorithms have high overheads and are unfit to meet the requirements of today\u27s wireless applications, while current wireless protocols are often best-effort and lack the guarantees provided by well-studied coordination solutions. Further, many communication primitives available today lack the ability to adapt to dynamic environments, and are often tuned during their design phase to reach a target performance, rather than be continuously updated at runtime to adapt to reality.In this thesis, we study the problem of efficient and low-latency consensus in the context of low-power wireless networks, where communication is unreliable and nodes can fail, and we investigate the design of a self-adaptive wireless stack, where the communication substrate is able to adapt to changes to its environment. We propose three new communication primitives: Wireless Paxos brings fault-tolerant consensus to low-power wireless networking, STARC is a middleware for safe vehicular coordination at intersections, while Dimmer builds on reinforcement learning to provide adaptivity to low-power wireless networks. We evaluate in-depth each primitive on testbed deployments and we provide an open-source implementation to enable their use and improvement by the community

    Robust and cheating-resilient power auctioning on Resource Constrained Smart Micro-Grids

    Get PDF
    The principle of Continuous Double Auctioning (CDA) is known to provide an efficient way of matching supply and demand among distributed selfish participants with limited information. However, the literature indicates that the classic CDA algorithms developed for grid-like applications are centralised and insensitive to the processing resources capacity, which poses a hindrance for their application on resource constrained, smart micro-grids (RCSMG). A RCSMG loosely describes a micro-grid with distributed generators and demand controlled by selfish participants with limited information, power storage capacity and low literacy, communicate over an unreliable infrastructure burdened by limited bandwidth and low computational power of devices. In this thesis, we design and evaluate a CDA algorithm for power allocation in a RCSMG. Specifically, we offer the following contributions towards power auctioning on RCSMGs. First, we extend the original CDA scheme to enable decentralised auctioning. We do this by integrating a token-based, mutual-exclusion (MUTEX) distributive primitive, that ensures the CDA operates at a reasonably efficient time and message complexity of O(N) and O(logN) respectively, per critical section invocation (auction market execution). Our CDA algorithm scales better and avoids the single point of failure problem associated with centralised CDAs (which could be used to adversarially provoke a break-down of the grid marketing mechanism). In addition, the decentralised approach in our algorithm can help eliminate privacy and security concerns associated with centralised CDAs. Second, to handle CDA performance issues due to malfunctioning devices on an unreliable network (such as a lossy network), we extend our proposed CDA scheme to ensure robustness to failure. Using node redundancy, we modify the MUTEX protocol supporting our CDA algorithm to handle fail-stop and some Byzantine type faults of sites. This yields a time complexity of O(N), where N is number of cluster-head nodes; and message complexity of O((logN)+W) time, where W is the number of check-pointing messages. These results indicate that it is possible to add fault tolerance to a decentralised CDA, which guarantees continued participation in the auction while retaining reasonable performance overheads. In addition, we propose a decentralised consumption scheduling scheme that complements the auctioning scheme in guaranteeing successful power allocation within the RCSMG. Third, since grid participants are self-interested we must consider the issue of power theft that is provoked when participants cheat. We propose threat models centred on cheating attacks aimed at foiling the extended CDA scheme. More specifically, we focus on the Victim Strategy Downgrade; Collusion by Dynamic Strategy Change, Profiling with Market Prediction; and Strategy Manipulation cheating attacks, which are carried out by internal adversaries (auction participants). Internal adversaries are participants who want to get more benefits but have no interest in provoking a breakdown of the grid. However, their behaviour is dangerous because it could result in a breakdown of the grid. Fourth, to mitigate these cheating attacks, we propose an exception handling (EH) scheme, where sentinel agents use allocative efficiency and message overheads to detect and mitigate cheating forms. Sentinel agents are tasked to monitor trading agents to detect cheating and reprimand the misbehaving participant. Overall, message complexity expected in light demand is O(nLogN). The detection and resolution algorithm is expected to run in linear time complexity O(M). Overall, the main aim of our study is achieved by designing a resilient and cheating-free CDA algorithm that is scalable and performs well on resource constrained micro-grids. With the growing popularity of the CDA and its resource allocation applications, specifically to low resourced micro-grids, this thesis highlights further avenues for future research. First, we intend to extend the decentralised CDA algorithm to allow for participantsā€™ mobile phones to connect (reconnect) at different shared smart meters. Such mobility should guarantee the desired CDA properties, the reliability and adequate security. Secondly, we seek to develop a simulation of the decentralised CDA based on the formal proofs presented in this thesis. Such a simulation platform can be used for future studies that involve decentralised CDAs. Third, we seek to find an optimal and efficient way in which the decentralised CDA and the scheduling algorithm can be integrated and deployed in a low resourced, smart micro-grid. Such an integration is important for system developers interested in exploiting the benefits of the two schemes while maintaining system efficiency. Forth, we aim to improve on the cheating detection and mitigation mechanism by developing an intrusion tolerance protocol. Such a scheme will allow continued auctioning in the presence of cheating attacks while incurring low performance overheads for applicability in a RCSMG

    Integration of analysis techniques in security and fault-tolerance

    Get PDF
    This thesis focuses on the study of integration of formal methodologies in security protocol analysis and fault-tolerance analysis. The research is developed in two different directions: interdisciplinary and intra-disciplinary. In the former, we look for a beneficial interaction between strategies of analysis in security protocols and fault-tolerance; in the latter, we search for connections among different approaches of analysis within the security area. In the following we summarize the main results of the research
    • ā€¦
    corecore