1,289 research outputs found

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    Fault aware task scheduling in cloud using min-min and DBSCAN

    Get PDF
    Cloud computing leverages computing resources by managing these resources globally in a more efficient manner as compared to individual resource services. It requires us to deliver the resources in a heterogeneous environment and also in a highly dynamic nature. Hence, there is always a risk of resource allocation failure that can maximize the delay in task execution. Such adverse impact in the cloud environment also raises questions on quality of service (QoS). Resource management for cloud application and service have bigger challenges and many researchers have proposed several solutions but there is room for improvement. Clustering the resources clustering and mapping them according to task can also be an option to deal with such task failure or mismanaged resource allocation. Density-based spatial clustering of applications with noise (DBSCAN) is a stochastic approach-based algorithm which has the capability to cluster the resources in a cloud environment. The proposed algorithm considers high execution enabled powerful data centers with least fault probability during resource allocation which reduces the probability of fault and increases the tolerance. The simulation is cone using CloudsSim 5.0 tool kit. The results show 25% average improve in execution time, 6.5% improvement in number of task completed and 3.48% improvement in count of task failed as compared to ACO, PSO, BB-BC (Bib = g bang Big Crunch) and WHO(Whale optimization algorithm)

    Enhancing reliability with Latin Square redundancy on desktop grids.

    Get PDF
    Computational grids are some of the largest computer systems in existence today. Unfortunately they are also, in many cases, the least reliable. This research examines the use of redundancy with permutation as a method of improving reliability in computational grid applications. Three primary avenues are explored - development of a new redundancy model, the Replication and Permutation Paradigm (RPP) for computational grids, development of grid simulation software for testing RPP against other redundancy methods and, finally, running a program on a live grid using RPP. An important part of RPP involves distributing data and tasks across the grid in Latin Square fashion. Two theorems and subsequent proofs regarding Latin Squares are developed. The theorems describe the changing position of symbols between the rows of a standard Latin Square. When a symbol is missing because a column is removed the theorems provide a basis for determining the next row and column where the missing symbol can be found. Interesting in their own right, the theorems have implications for redundancy. In terms of the redundancy model, the theorems allow one to state the maximum makespan in the face of missing computational hosts when using Latin Square redundancy. The simulator software was developed and used to compare different data and task distribution schemes on a simulated grid. The software clearly showed the advantage of running RPP, which resulted in faster completion times in the face of computational host failures. The Latin Square method also fails gracefully in that jobs complete with massive node failure while increasing makespan. Finally an Inductive Logic Program (ILP) for pharmacophore search was executed, using a Latin Square redundancy methodology, on a Condor grid in the Dahlem Lab at the University of Louisville Speed School of Engineering. All jobs completed, even in the face of large numbers of randomly generated computational host failures

    A model-based approach for automatic recovery from memory leaks in enterprise applications

    Get PDF
    Large-scale distributed computing systems such as data centers are hosted on heterogeneous and networked servers that execute in a dynamic and uncertain operating environment, caused by factors such as time-varying user workload and various failures. Therefore, achieving stringent quality-of-service goals is a challenging task, requiring a comprehensive approach to performance control, fault diagnosis, and failure recovery. This work presents a model-based approach for fault management, which integrates limited lookahead control (LLC), diagnosis, and fault-tolerance concepts that: (1) enables systems to adapt to environment variations, (2) maintains the availability and reliability of the system, (3) facilitates system recovery from failures. We focused on memory leak errors in this thesis. A characterization function is designed to detect memory leaks. Then, a LLC is applied to enable the computing system to adapt efficiently to variations in the workload, and to enable the system recover from memory leaks and maintain functionality

    Robust and cheating-resilient power auctioning on Resource Constrained Smart Micro-Grids

    Get PDF
    The principle of Continuous Double Auctioning (CDA) is known to provide an efficient way of matching supply and demand among distributed selfish participants with limited information. However, the literature indicates that the classic CDA algorithms developed for grid-like applications are centralised and insensitive to the processing resources capacity, which poses a hindrance for their application on resource constrained, smart micro-grids (RCSMG). A RCSMG loosely describes a micro-grid with distributed generators and demand controlled by selfish participants with limited information, power storage capacity and low literacy, communicate over an unreliable infrastructure burdened by limited bandwidth and low computational power of devices. In this thesis, we design and evaluate a CDA algorithm for power allocation in a RCSMG. Specifically, we offer the following contributions towards power auctioning on RCSMGs. First, we extend the original CDA scheme to enable decentralised auctioning. We do this by integrating a token-based, mutual-exclusion (MUTEX) distributive primitive, that ensures the CDA operates at a reasonably efficient time and message complexity of O(N) and O(logN) respectively, per critical section invocation (auction market execution). Our CDA algorithm scales better and avoids the single point of failure problem associated with centralised CDAs (which could be used to adversarially provoke a break-down of the grid marketing mechanism). In addition, the decentralised approach in our algorithm can help eliminate privacy and security concerns associated with centralised CDAs. Second, to handle CDA performance issues due to malfunctioning devices on an unreliable network (such as a lossy network), we extend our proposed CDA scheme to ensure robustness to failure. Using node redundancy, we modify the MUTEX protocol supporting our CDA algorithm to handle fail-stop and some Byzantine type faults of sites. This yields a time complexity of O(N), where N is number of cluster-head nodes; and message complexity of O((logN)+W) time, where W is the number of check-pointing messages. These results indicate that it is possible to add fault tolerance to a decentralised CDA, which guarantees continued participation in the auction while retaining reasonable performance overheads. In addition, we propose a decentralised consumption scheduling scheme that complements the auctioning scheme in guaranteeing successful power allocation within the RCSMG. Third, since grid participants are self-interested we must consider the issue of power theft that is provoked when participants cheat. We propose threat models centred on cheating attacks aimed at foiling the extended CDA scheme. More specifically, we focus on the Victim Strategy Downgrade; Collusion by Dynamic Strategy Change, Profiling with Market Prediction; and Strategy Manipulation cheating attacks, which are carried out by internal adversaries (auction participants). Internal adversaries are participants who want to get more benefits but have no interest in provoking a breakdown of the grid. However, their behaviour is dangerous because it could result in a breakdown of the grid. Fourth, to mitigate these cheating attacks, we propose an exception handling (EH) scheme, where sentinel agents use allocative efficiency and message overheads to detect and mitigate cheating forms. Sentinel agents are tasked to monitor trading agents to detect cheating and reprimand the misbehaving participant. Overall, message complexity expected in light demand is O(nLogN). The detection and resolution algorithm is expected to run in linear time complexity O(M). Overall, the main aim of our study is achieved by designing a resilient and cheating-free CDA algorithm that is scalable and performs well on resource constrained micro-grids. With the growing popularity of the CDA and its resource allocation applications, specifically to low resourced micro-grids, this thesis highlights further avenues for future research. First, we intend to extend the decentralised CDA algorithm to allow for participants’ mobile phones to connect (reconnect) at different shared smart meters. Such mobility should guarantee the desired CDA properties, the reliability and adequate security. Secondly, we seek to develop a simulation of the decentralised CDA based on the formal proofs presented in this thesis. Such a simulation platform can be used for future studies that involve decentralised CDAs. Third, we seek to find an optimal and efficient way in which the decentralised CDA and the scheduling algorithm can be integrated and deployed in a low resourced, smart micro-grid. Such an integration is important for system developers interested in exploiting the benefits of the two schemes while maintaining system efficiency. Forth, we aim to improve on the cheating detection and mitigation mechanism by developing an intrusion tolerance protocol. Such a scheme will allow continued auctioning in the presence of cheating attacks while incurring low performance overheads for applicability in a RCSMG

    Self-Healing Distributed Scheduling Platform

    Get PDF
    International audienceDistributed systems require effective mechanisms to manage the reliable provisioning of computational resources from different and distributed providers. Moreover, the dynamic environment that affects the behaviour of such systems and the complexity of these dynamics demand autonomous capabilities to ensure the behaviour of distributed scheduling platforms and to achieve business and user objectives. In this paper we propose a self-adaptive distributed scheduling platform composed of multiple agents implemented as intelligent feedback control loops to support policy-based scheduling and expose self-healing capabilities. Our platform leverages distributed scheduling processes by (i) allowing each provider to maintain its own internal scheduling process, and (ii) implementing self-healing capabilities based on agent module recovery. Simulated tests are performed to determine the optimal number of agents to be used in the negotiation phase without affecting the scheduling cost function. Test results on a real-life platform are presented to evaluate recovery times and optimize platform parameters
    • …
    corecore