16,962 research outputs found

    Distributed data mining in grid computing environments

    Get PDF
    The official published version of this article can be found at the link below.The computing-intensive data mining for inherently Internet-wide distributed data, referred to as Distributed Data Mining (DDM), calls for the support of a powerful Grid with an effective scheduling framework. DDM often shares the computing paradigm of local processing and global synthesizing. It involves every phase of Data Mining (DM) processes, which makes the workflow of DDM very complex and can be modelled only by a Directed Acyclic Graph (DAG) with multiple data entries. Motivated by the need for a practical solution of the Grid scheduling problem for the DDM workflow, this paper proposes a novel two-phase scheduling framework, including External Scheduling and Internal Scheduling, on a two-level Grid architecture (InterGrid, IntraGrid). Currently a DM IntraGrid, named DMGCE (Data Mining Grid Computing Environment), has been developed with a dynamic scheduling framework for competitive DAGs in a heterogeneous computing environment. This system is implemented in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems from oil well logging analysis are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper

    Managing Uncertainty: A Case for Probabilistic Grid Scheduling

    Get PDF
    The Grid technology is evolving into a global, service-orientated architecture, a universal platform for delivering future high demand computational services. Strong adoption of the Grid and the utility computing concept is leading to an increasing number of Grid installations running a wide range of applications of different size and complexity. In this paper we address the problem of elivering deadline/economy based scheduling in a heterogeneous application environment using statistical properties of job historical executions and its associated meta-data. This approach is motivated by a study of six-month computational load generated by Grid applications in a multi-purpose Grid cluster serving a community of twenty e-Science projects. The observed job statistics, resource utilisation and user behaviour is discussed in the context of management approaches and models most suitable for supporting a probabilistic and autonomous scheduling architecture

    Task Scheduling on the Cloud with Hard Constraints

    Full text link
    Scheduling Bag-of-Tasks (BoT) applications on the cloud can be more challenging than grid and cluster environ- ments. This is because a user may have a budgetary constraint or a deadline for executing the BoT application in order to keep the overall execution costs low. The research in this paper is motivated to investigate task scheduling on the cloud, given two hard constraints based on a user-defined budget and a deadline. A heuristic algorithm is proposed and implemented to satisfy the hard constraints for executing the BoT application in a cost effective manner. The proposed algorithm is evaluated using four scenarios that are based on the trade-off between performance and the cost of using different cloud resource types. The experimental evaluation confirms the feasibility of the algorithm in satisfying the constraints. The key observation is that multiple resource types can be a better alternative to using a single type of resource.Comment: Visionary Track of the IEEE 11th World Congress on Services (IEEE SERVICES 2015

    A fast, effective local search for scheduling independent jobs in heterogeneous computing environments

    Get PDF
    The efficient scheduling of independent computational jobs in a heterogeneous computing (HC) environment is an important problem in domains such as grid computing. Finding optimal schedules for such an environment is (in general) an NP-hard problem, and so heuristic approaches must be used. Work with other NP-hard problems has shown that solutions found by heuristic algorithms can often be improved by applying local search procedures to the solution found. This paper describes a simple but effective local search procedure for scheduling independent jobs in HC environments which, when combined with fast construction heuristics, can find shorter schedules on benchmark problems than other solution techniques found in the literature, and in significantly less time

    A Simulated Annealing Method to Cover Dynamic Load Balancing in Grid Environment

    Get PDF
    High-performance scheduling is critical to the achievement of application performance on the computational grid. New scheduling algorithms are in demand for addressing new concerns arising in the grid environment. One of the main phases of scheduling on a grid is related to the load balancing problem therefore having a high-performance method to deal with the load balancing problem is essential to obtain a satisfactory high-performance scheduling. This paper presents SAGE, a new high-performance method to cover the dynamic load balancing problem by means of a simulated annealing algorithm. Even though this problem has been addressed with several different approaches only one of these methods is related with simulated annealing algorithm. Preliminary results show that SAGE not only makes it possible to find a good solution to the problem (effectiveness) but also in a reasonable amount of time (efficiency)
    • …
    corecore