5 research outputs found

    A Novel Technique for Task Re-Allocation in Distributed Computing System

    Get PDF
    A distributed computing is software system in which components are located on different attached computers can communicate and organize their actions by transferring messages. A task applied on the distributed system must be reliable and feasible. The distributed system for instance grid networks, robotics, air traffic control systems, etc. exceedingly depends on time. If not detected accurately and recovered at the proper time, a single error in real time distributed system can cause a whole system failure. Fault-tolerance is the key method which is mostly used to provide continuous reliability in these systems. There are some challenges in distributed computing system such as resource sharing, transparency, dependability, Complex mappings, concurrency, Fault tolerance etc. In this paper, we focus on fault tolerance which is responsible for the degradation of the system. A novel technique is proposed based upon reliability to overcome fault tolerance problem and re-allocate the task. DOI: 10.17762/ijritcc2321-8169.15080

    A JADE Implemented Mobile Agent Based Host Platform Security

    Get PDF
    Mobile agent paradigm relies heavily on security of both the agent as well as its host platform. Both of the entities are prone to security threats and attacks such as masquerading, denial-of-service and unauthorized access. Security fissures on the platform can result in significant losses. This paper produced a Robust Series Checkpointing Algorithm (SCpA) implemented in JADE environment, which extends our previous work, keeping in mind the security of mobile host platforms. The algorithm is Series Check-pointing in the sense that layers are placed in series one after the other, in the framework, to provide two-level guard system so that if incase, any malevolent agent somehow able to crack the security at first level and unfortunately managed to enter the platform; may be trapped at the next level and hence block the threat. The work also aimed to evaluate the performance of the agents’ execution, through graphical analysis. Our previous work proposed successfully a platform security framework (PSF) to secure host platform from various security threats, but the technical algorithm realization and its implementation was deliberately ignored, which has now been completed.   Keywords: Mobile Agent, Security, Reputation Score, Threshold Value, Check-points, Algorithm

    Fault Tolerance for Stream Programs on Parallel Platforms

    Get PDF
    A distributed system is defined as a collection of autonomous computers connected by a network, and with the appropriate distributed software for the system to be seen by users as a single entity capable of providing computing facilities. Distributed systems with centralised control have a distinguished control node, called leader node. The main role of a leader node is to distribute and manage shared resources in a resource-efficient manner. A distributed system with centralised control can use stream processing networks for communication. In a stream processing system, applications typically act as continuous queries, ingesting data continuously, analyzing and correlating the data, and generating a stream of results. Fault tolerance is the ability of a system to process the information, even if it happens any failure or anomaly in the system. Fault tolerance has become an important requirement for distributed systems, due to the possibility of failure has currently risen to the increase in number of nodes and the runtime of applications in distributed system. Therefore, to resolve this problem, it is important to add fault tolerance mechanisms order to provide the internal capacity to preserve the execution of the tasks despite the occurrence of faults. If the leader on a centralised control system fails, it is necessary to elect a new leader. While leader election has received a lot of attention in message-passing systems, very few solutions have been proposed for shared memory systems, as we propose. In addition, rollback-recovery strategies are important fault tolerance mechanisms for distributed systems, since that it is based on storing information into a stable storage in failure-free state and when a failure affects a node, the system uses the information stored to recover the state of the node before the failure appears. In this thesis, we are focused on creating two fault tolerance mechanisms for distributed systems with centralised control that uses stream processing for communication. These two mechanism created are leader election and log-based rollback-recovery, implemented using LPEL. The leader election method proposed is based on an atomic Compare-And-Swap (CAS) instruction, which is directly available on many processors. Our leader election method works with idle nodes, meaning that only the non-busy nodes compete to become the new leader while the busy nodes can continue with their tasks and later update their leader reference. Furthermore, this leader election method has short completion time and low space complexity. The log-based rollback-recovery method proposed for distributed systems with stream processing networks is a novel approach that is free from domino effect and does not generate orphan messages accomplishing the always-no-orphans consistency condition. Additionally, this approach has lower overhead impact into the system compared to other approaches, and it is a mechanism that provides scalability, because it is insensitive to the number of nodes in the system

    Fault Tolerance for High-Performance Applications Using Structured Parallelism Models

    Get PDF
    In the last years parallel computing has increasingly exploited the high-level models of structured parallel programming, an example of which are algorithmic skeletons. This trend has been motivated by the properties featuring structured parallelism models, which can be used to derive several (static and dynamic) optimizations at various implementation levels. In this thesis we study the properties of structured parallel models useful for attacking the issue of providing a fault tolerance support oriented towards High-Performance applications. This issue has been traditionally faced in two ways: (i) in the context of unstructured parallelism models (e.g. MPI), which computation model is essentially based on a distributed set of processes communicating through message-passing, with an approach based on checkpointing and rollback recovery or software replication; (ii) in the context of high-level models, based on a specific parallelism model (e.g. data-flow) and/or an implementation model (e.g. master-slave), by introducing specific techniques based on the properties of the programming and computation models themselves. In this thesis we make a step towards a more abstract viewpoint and we highlight the properties of structured parallel models interesting for fault tolerance purposes. We consider two classes of parallel programs (namely task parallel and data parallel) and we introduce a fault tolerance support based on checkpointing and rollback recovery. The support is derived according to the high-level properties of the parallel models: we call this derivation specialization of fault tolerance techniques, highlighting the difference with classical solutions supporting structure-unaware computations. As a consequence of this specialization, the introduced fault tolerance techniques can be configured and optimized to meet specific needs at different implementation levels. That is, the supports we present do not target a single computing platform or a specific class of them. Indeed the specializations are the mechanism to target specific issues of the exploited environment and of the implemented applications, as proper choices of the protocols and their configurations

    ANTECEDENCE GRAPH APPROACH TO CHECKPOINTING FOR FAULT TOLERANCE IN MULTI AGENT SYSTEM

    No full text
    corecore