3,888 research outputs found

    Modeling for the Optimal Co-scheduling Problem of Data Replication and Job Execution in Data Grids

    Get PDF
    There are many challenges in Data Grids, and especially the data replication and the job scheduling are significant problems. There have been many works on the data replication and the job scheduling in Data Grids separately. However, there are still only a few works on solving those two significant problems together in Data Grids. In this work, we propose the optimization model for the co-scheduling problem of the data replication and the job execution in Data Grids, which is more realistic and widely adaptable to real systems. We use 0-1 integer programming model to formulate the co-scheduling problem of the data replication and the job scheduling. Our final goal in this work is to find the optimal solution, that is, the data replication and job assignment, such that it minimizes the response time. The proposed optimization model will lead us this goal

    The Impact of Data Replicatino on Job Scheduling Performance in Hierarchical data Grid

    Full text link
    In data-intensive applications data transfer is a primary cause of job execution delay. Data access time depends on bandwidth. The major bottleneck to supporting fast data access in Grids is the high latencies of Wide Area Networks and Internet. Effective scheduling can reduce the amount of data transferred across the internet by dispatching a job to where the needed data are present. Another solution is to use a data replication mechanism. Objective of dynamic replica strategies is reducing file access time which leads to reducing job runtime. In this paper we develop a job scheduling policy and a dynamic data replication strategy, called HRS (Hierarchical Replication Strategy), to improve the data access efficiencies. We study our approach and evaluate it through simulation. The results show that our algorithm has improved 12% over the current strategies.Comment: 11 pages, 7 figure

    A bipartite graph model for placement, scheduling and replication in data grids

    Get PDF
    Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2012.Thesis (Master's) -- Bilkent University, 2012.Includes bibliographical references leaves 63-68.Data grids provide geographically distributed resources for applications that generate and utilize large data sets. However, there are some issues that hinder to ensure fast access to data and low turnaround time for the jobs in data grids. To address these issues, several data replication and job scheduling strategies have been introduced to offer high data availability, low bandwidth consumption, and reduced turnaround time for grid systems. Multiple copies of existing data are maintained at different locations via data replication. Data replication strategies are broadly categorized as static and dynamic. In static replication strategies, replication is performed during the system design, and replica decisions are generally based on a cost model that includes data access costs, bandwidth characteristics and storage constraints of the grid system. In dynamic replication strategies, the replication operation is managed at runtime so that the system adapts to the changes in user request patterns dynamically. Job scheduling strategies fall under two main categories: online mode and batch mode. The online mode scheduler assigns tasks to sites as soon as they arrive. In the batch mode, the complete set of jobs are taken into account and scheduled at the same time by using all the grid information. In this thesis, we propose a bipartite graph model for tasks and files in the grid system, and then we partition this graph to obtain a data placement and job scheduling strategy. The obtained parts are further refined in order to be assigned to grid sites by using a KL-based heuristic that takes the bandwidth and hop information between sites into account. Replication is achieved by replicating a certain amount of most accessed files chosen prior to the partitioning process. Experimental results indicate that the increase in the partitioning quality reflects positively on the mapping quality. Morever, it is observed that the communication cost is notably decreased when the data replication is applied. Hence, our results show that by replicating a small amount of data files and placing files onto sites using bipartite graph model, we can obtain performance improvement for scheduling jobs compared to no replication.Dal, BurcuM.S

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Adaptive scheduling in grids

    Get PDF

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
    corecore