19 research outputs found

    The Impact of Data Replicatino on Job Scheduling Performance in Hierarchical data Grid

    Full text link
    In data-intensive applications data transfer is a primary cause of job execution delay. Data access time depends on bandwidth. The major bottleneck to supporting fast data access in Grids is the high latencies of Wide Area Networks and Internet. Effective scheduling can reduce the amount of data transferred across the internet by dispatching a job to where the needed data are present. Another solution is to use a data replication mechanism. Objective of dynamic replica strategies is reducing file access time which leads to reducing job runtime. In this paper we develop a job scheduling policy and a dynamic data replication strategy, called HRS (Hierarchical Replication Strategy), to improve the data access efficiencies. We study our approach and evaluate it through simulation. The results show that our algorithm has improved 12% over the current strategies.Comment: 11 pages, 7 figure

    A dynamic replica creation: Which file to replicate?

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files and provides intensive computational resources across geographically distributed collaboration.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.Data replication is one of the methods used to improve the performance of data access in distributed systems.In this paper, we propose a dynamic replication strategy that is based on exponential growth or decay rate and dependency level of data files (EXPM).Simulation results (via Optorsim) show that EXPM outperformed LALW in the measured metrics – mean job execution time, effective network usage and average storage usage

    A dynamic replication strategy based on exponential growth/decay rate

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.Data replication is one of the methods used to improve the performance of data access in distributed systems.In this paper, we include issues arising in data replication domain and also we propose a dynamic replication strategy that is based on exponential growth or decay rate. The purpose of the proposed strategy is to identify which files to be replicated.This is achieved by estimating number of accessed of a file in the upcoming time interval.The greater the value, the more popular the file is and therefore will be selected to be replicate

    Replica maintenance strategy for data grid

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration.Increasing the performance of such system can be achieved by improving the overall resource usage, which includes network and storage resources.Improving network resource usage is achieved by good utilization of network bandwidth that is considered as an important factor affecting job execution time.Meanwhile, improving storage resource usage is achieved by good utilization of storage space usage. Data replication is one of the methods used to improve the performance of data access in distributed systems by replicating multiple copies of data files in the distributed sites.Having distributed the replicas to various locations, they need to be monitored.As a result of dynamic changes in the data grid environment, some of the replicas need to be relocated.In this paper we proposed a maintenance replica placement strategy termed as Unwanted Replica Deletion Strategy (URDS) as a part of Replica maintenance service.The main purpose of the proposed strategy is to find the placement of unwanted replicas to be deleted.OptorSim is used to evaluate the performance of the proposed strategy. The simulation results show that URDS requires less execution time and consumes less network usage and has a best utilization of storage space usage compared to existing approaches

    Replication in data grid: Determining important resources

    Get PDF
    Replication is an important activity in determining the availability of resources in data grid.Nevertheless, due to high computational and storage cost, having replicas for all existing resources may not be an efficient practice. Existing approach in data replication have been focusing on utilizing information on the resource itself or network capability in order to determine replication of resources.In this paper, we present the integration of three types of relationships for the mentioned purpose. The undertaken approach combines the viewpoint of user, file system and the grid itself in identifying important resource that requires replication.Experimental work has been done via OptorSim and evaluation is made based on the job execution time.Results suggested that the proposed strategy produces a better outcome compared to existing approaches

    Replica Creation Algorithm for Data Grids

    Get PDF
    Data grid system is a data management infrastructure that facilitates reliable access and sharing of large amount of data, storage resources, and data transfer services that can be scaled across distributed locations. This thesis presents a new replication algorithm that improves data access performance in data grids by distributing relevant data copies around the grid. The new Data Replica Creation Algorithm (DRCM) improves performance of data grid systems by reducing job execution time and making the best use of data grid resources (network bandwidth and storage space). Current algorithms focus on number of accesses in deciding which file to replicate and where to place them, which ignores resources’ capabilities. DRCM differs by considering both user and resource perspectives; strategically placing replicas at locations that provide the lowest transfer cost. The proposed algorithm uses three strategies: Replica Creation and Deletion Strategy (RCDS), Replica Placement Strategy (RPS), and Replica Replacement Strategy (RRS). DRCM was evaluated using network simulation (OptorSim) based on selected performance metrics (mean job execution time, efficient network usage, average storage usage, and computing element usage), scenarios, and topologies. Results revealed better job execution time with lower resource consumption than existing approaches. This research contributes replication strategies embodied in one algorithm that enhances data grid performance, capable of making a decision on creating or deleting more than one file during same decision. Furthermore, dependency-level-between-files criterion was utilized and integrated with the exponential growth/decay model to give an accurate file evaluation

    A novel dynamic replica creation mechanism for Data Grids

    Get PDF
    The abstract Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration.A key concept in Data Grids is replication of data, whereby multiple copies of data are stored at different geographical locations, making access to data faster and more reliable.However, replication is also bounded by two factors: the size of storage available at different sites within the Data Grid and the bandwidth between these sites. In this paper, we proposed a dynamic replication mechanism termed as Replica Number Mechanism (RNM) that determine the optimal number of replicas to be created or deleted with the aim of minimizing the overall resource usage (network bandwidth and storage usage).OptorSim is used to evaluate the performance of the proposed mechanism. The simulation results show that RNM requires less execution time and consumes less network usage and storage usage compared to existing approaches of Simple Optimizer and LFU (Least Frequently Used)

    Dynamic replication algorithm in data grid: Survey

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. It is not enough to provide convenient accessibility to these data by only high speed network and large mainframe systems. For improving the performance of file accesses and to ease the sharing amongst distributed collaboration, such a system needs replication services. Data replication is a common method used to improve the performance of data access in distributed systems. In this paper, we present a survey of some related previous works and highlight some various algorithms that have been proposed by other researchers. A dynamic replication model based on mathematical concepts is proposed. The main purpose of this model is find out the popular file using the concept of exponential decay/growth. We estimate the next number of access for the file
    corecore