3,888 research outputs found
Modeling for the Optimal Co-scheduling Problem of Data Replication and Job Execution in Data Grids
There are many challenges in Data Grids, and especially the data replication and the job scheduling are significant problems. There have been many works on the data replication and the job scheduling in Data Grids separately. However, there are still only a few works on solving those two significant problems together in Data Grids. In this work, we propose the optimization model for the co-scheduling problem of the data replication and the job execution in Data Grids, which is more realistic and widely adaptable to real systems. We use 0-1 integer programming model to formulate the co-scheduling problem of the data replication and the job scheduling. Our final goal in this work is to find the optimal solution, that is, the data replication and job assignment, such that it minimizes the response time. The proposed optimization model will lead us this goal
The Impact of Data Replicatino on Job Scheduling Performance in Hierarchical data Grid
In data-intensive applications data transfer is a primary cause of job
execution delay. Data access time depends on bandwidth. The major bottleneck to
supporting fast data access in Grids is the high latencies of Wide Area
Networks and Internet. Effective scheduling can reduce the amount of data
transferred across the internet by dispatching a job to where the needed data
are present. Another solution is to use a data replication mechanism. Objective
of dynamic replica strategies is reducing file access time which leads to
reducing job runtime. In this paper we develop a job scheduling policy and a
dynamic data replication strategy, called HRS (Hierarchical Replication
Strategy), to improve the data access efficiencies. We study our approach and
evaluate it through simulation. The results show that our algorithm has
improved 12% over the current strategies.Comment: 11 pages, 7 figure
A bipartite graph model for placement, scheduling and replication in data grids
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2012.Thesis (Master's) -- Bilkent University, 2012.Includes bibliographical references leaves 63-68.Data grids provide geographically distributed resources for applications that generate
and utilize large data sets. However, there are some issues that hinder to
ensure fast access to data and low turnaround time for the jobs in data grids. To
address these issues, several data replication and job scheduling strategies have
been introduced to offer high data availability, low bandwidth consumption, and
reduced turnaround time for grid systems. Multiple copies of existing data are
maintained at different locations via data replication. Data replication strategies
are broadly categorized as static and dynamic. In static replication strategies,
replication is performed during the system design, and replica decisions are generally
based on a cost model that includes data access costs, bandwidth characteristics
and storage constraints of the grid system. In dynamic replication strategies,
the replication operation is managed at runtime so that the system adapts to the
changes in user request patterns dynamically. Job scheduling strategies fall under
two main categories: online mode and batch mode. The online mode scheduler
assigns tasks to sites as soon as they arrive. In the batch mode, the complete set
of jobs are taken into account and scheduled at the same time by using all the
grid information.
In this thesis, we propose a bipartite graph model for tasks and files in the
grid system, and then we partition this graph to obtain a data placement and
job scheduling strategy. The obtained parts are further refined in order to be assigned
to grid sites by using a KL-based heuristic that takes the bandwidth and
hop information between sites into account. Replication is achieved by replicating
a certain amount of most accessed files chosen prior to the partitioning process.
Experimental results indicate that the increase in the partitioning quality reflects
positively on the mapping quality. Morever, it is observed that the communication
cost is notably decreased when the data replication is applied. Hence, our results show that by replicating a small amount of data files and placing files onto
sites using bipartite graph model, we can obtain performance improvement for
scheduling jobs compared to no replication.Dal, BurcuM.S
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
- …