40,741 research outputs found
Relationship based replication algorithm for data grid
Data Grid is an infrastructure that manages huge amount of data files and provides
intensive computational resources across geographically distributed systems.To increase resource availability and to ease resource sharing in such environment, there
is a need for replication services.This research proposes a replication algorithm,
termed as Relationship based Replication (RBR) that integrates users, grid and
system perspective.In particular, the RBR includes information of three different
relationships in identifying file(s) that requires replication; file-to-user, file-to-file and file-to-grid. Such an approach overcomes existing algorithms that
is based either on users request or resource capabilities as an individual. The Relationship based Replication algorithm aims to improve the Data Grid performance
by reducing the job execution time, bandwidth and storage usage.The RBR was realized using a network simulation (OptorSim) and experiment results revealed that it offers better performance than existing replication algorithms
Dynamic replication algorithm in data grid: Survey
Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. It is not enough to provide convenient accessibility to these data by only high speed network and large mainframe systems. For improving the performance of file accesses and to ease the sharing amongst distributed collaboration, such a system needs replication services. Data replication is a common method used to improve the performance of data access in distributed systems. In this paper, we present a survey of some related previous works and highlight some various algorithms that have been proposed by other researchers.
A dynamic replication model based on mathematical concepts is proposed. The main purpose of this model is find out the popular file using the concept of exponential decay/growth. We estimate the next number of access for the file
Perbandingan Distributed Replicated dengan Striped Replicated untuk Mereplika File dalam GlusterFS pada Computer Cluster
ABSTRAKSI: Tingkat ketersediaan file pada server di dalam suatu jaringan menjadi tuntutan yang harus selalu dipenuhi. Para pengguna tentunya ingin selalu dapat mengakses file kapan pun. Padahal apabila terjadi suatu gangguan, misalnya server down, maka file tidak dapat diakses. Maka dari itu dibutuhkan adanya replikasi file untuk mengatasi permasalahan tersebut. Jadi, ketika terjadi server down, maka pengguna tetap dapat mengakses file yang mereka butuhkan. File yang ada pada server direplikasi ke server-server lain. Semua server tersebut tergabung di dalam suatu Cluster, yang kemudian disebut dengan computer Cluster. Dan salah satu file system yang dapat digunakan untuk mereplikasi file pada computer Cluster adalah GlusterFS.GlusterFS memiliki beberapa metode untuk mereplikasi file, diantaranya adalah Distributed Replicated dan Striped Replicated. Pada Distributed Replicated, file didistribusikan ke server-server yang berada di dalam satu Cluster. Sedangkan Striped Replicated memiliki satu tahap yang lebih banyak, yaitu sebelum mendistribusikan file ke server-server yang berada di satu Cluster, file tersebut dipecah-pecah terlebih dahulu. Dalam penelitian kali ini, yang dianalisis adalah tingkat efektifitas dan efisiensi dari kedua metode tersebut pada saat proses replikasi file berlangsung.Hasil dari penelitian ini, kedua metode tersebut sama-sama efektif dalam mereplikasi file. Sedangkan dari tingkat efisiensi, secara keseluruhan, Distributed Replicated lebih efisien daripada Striped Replicated.Kata Kunci : GlusterFS, replikasi, Distributed Replicated, Striped Replicated, ClusterABSTRACT: File availability is a requirement that must be fulfilled. Users always want to access their files whenever they want. Although there are problems occur, for an example, server down, so the files cannot be acessed. Thus, data replication is needed to solve the problem. So whenever the server is down, users still can access their files. Those files are replicated to other servers. All these servers are connencted each other, create a new system called computer Cluster. One of the file system that support file replication in computer Cluster is GlusterFS.GlusterFS has some file replication methods, such as Distributed Replicated and Striped Replicated. In the Distributed Replicated algorithm, the files are distributed to other servers that connected to a Cluster, on the other hand, Striped Replicated has further step before distribute the files to another servers, each file is splitted at first. In this research, the performance values that being analized are effectifity and efficiency rate from those two algorithms when doing file replication with GlusterFS.And the result of this research, these two algorithms both are effective to replicate the files. Based on the efficiency rate, the result shows that Distributed Replicated has more efficient than Striped Replicated.Keyword: GlusterFS, replication, Distributed Replicated, Striped Replicated, Cluste
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
The Impact of Data Replicatino on Job Scheduling Performance in Hierarchical data Grid
In data-intensive applications data transfer is a primary cause of job
execution delay. Data access time depends on bandwidth. The major bottleneck to
supporting fast data access in Grids is the high latencies of Wide Area
Networks and Internet. Effective scheduling can reduce the amount of data
transferred across the internet by dispatching a job to where the needed data
are present. Another solution is to use a data replication mechanism. Objective
of dynamic replica strategies is reducing file access time which leads to
reducing job runtime. In this paper we develop a job scheduling policy and a
dynamic data replication strategy, called HRS (Hierarchical Replication
Strategy), to improve the data access efficiencies. We study our approach and
evaluate it through simulation. The results show that our algorithm has
improved 12% over the current strategies.Comment: 11 pages, 7 figure
- …