40,741 research outputs found

    Relationship based replication algorithm for data grid

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files and provides intensive computational resources across geographically distributed systems.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.This research proposes a replication algorithm, termed as Relationship based Replication (RBR) that integrates users, grid and system perspective.In particular, the RBR includes information of three different relationships in identifying file(s) that requires replication; file-to-user, file-to-file and file-to-grid. Such an approach overcomes existing algorithms that is based either on users request or resource capabilities as an individual. The Relationship based Replication algorithm aims to improve the Data Grid performance by reducing the job execution time, bandwidth and storage usage.The RBR was realized using a network simulation (OptorSim) and experiment results revealed that it offers better performance than existing replication algorithms

    Dynamic replication algorithm in data grid: Survey

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. It is not enough to provide convenient accessibility to these data by only high speed network and large mainframe systems. For improving the performance of file accesses and to ease the sharing amongst distributed collaboration, such a system needs replication services. Data replication is a common method used to improve the performance of data access in distributed systems. In this paper, we present a survey of some related previous works and highlight some various algorithms that have been proposed by other researchers. A dynamic replication model based on mathematical concepts is proposed. The main purpose of this model is find out the popular file using the concept of exponential decay/growth. We estimate the next number of access for the file

    Perbandingan Distributed Replicated dengan Striped Replicated untuk Mereplika File dalam GlusterFS pada Computer Cluster

    Get PDF
    ABSTRAKSI: Tingkat ketersediaan file pada server di dalam suatu jaringan menjadi tuntutan yang harus selalu dipenuhi. Para pengguna tentunya ingin selalu dapat mengakses file kapan pun. Padahal apabila terjadi suatu gangguan, misalnya server down, maka file tidak dapat diakses. Maka dari itu dibutuhkan adanya replikasi file untuk mengatasi permasalahan tersebut. Jadi, ketika terjadi server down, maka pengguna tetap dapat mengakses file yang mereka butuhkan. File yang ada pada server direplikasi ke server-server lain. Semua server tersebut tergabung di dalam suatu Cluster, yang kemudian disebut dengan computer Cluster. Dan salah satu file system yang dapat digunakan untuk mereplikasi file pada computer Cluster adalah GlusterFS.GlusterFS memiliki beberapa metode untuk mereplikasi file, diantaranya adalah Distributed Replicated dan Striped Replicated. Pada Distributed Replicated, file didistribusikan ke server-server yang berada di dalam satu Cluster. Sedangkan Striped Replicated memiliki satu tahap yang lebih banyak, yaitu sebelum mendistribusikan file ke server-server yang berada di satu Cluster, file tersebut dipecah-pecah terlebih dahulu. Dalam penelitian kali ini, yang dianalisis adalah tingkat efektifitas dan efisiensi dari kedua metode tersebut pada saat proses replikasi file berlangsung.Hasil dari penelitian ini, kedua metode tersebut sama-sama efektif dalam mereplikasi file. Sedangkan dari tingkat efisiensi, secara keseluruhan, Distributed Replicated lebih efisien daripada Striped Replicated.Kata Kunci : GlusterFS, replikasi, Distributed Replicated, Striped Replicated, ClusterABSTRACT: File availability is a requirement that must be fulfilled. Users always want to access their files whenever they want. Although there are problems occur, for an example, server down, so the files cannot be acessed. Thus, data replication is needed to solve the problem. So whenever the server is down, users still can access their files. Those files are replicated to other servers. All these servers are connencted each other, create a new system called computer Cluster. One of the file system that support file replication in computer Cluster is GlusterFS.GlusterFS has some file replication methods, such as Distributed Replicated and Striped Replicated. In the Distributed Replicated algorithm, the files are distributed to other servers that connected to a Cluster, on the other hand, Striped Replicated has further step before distribute the files to another servers, each file is splitted at first. In this research, the performance values that being analized are effectifity and efficiency rate from those two algorithms when doing file replication with GlusterFS.And the result of this research, these two algorithms both are effective to replicate the files. Based on the efficiency rate, the result shows that Distributed Replicated has more efficient than Striped Replicated.Keyword: GlusterFS, replication, Distributed Replicated, Striped Replicated, Cluste

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    The Impact of Data Replicatino on Job Scheduling Performance in Hierarchical data Grid

    Full text link
    In data-intensive applications data transfer is a primary cause of job execution delay. Data access time depends on bandwidth. The major bottleneck to supporting fast data access in Grids is the high latencies of Wide Area Networks and Internet. Effective scheduling can reduce the amount of data transferred across the internet by dispatching a job to where the needed data are present. Another solution is to use a data replication mechanism. Objective of dynamic replica strategies is reducing file access time which leads to reducing job runtime. In this paper we develop a job scheduling policy and a dynamic data replication strategy, called HRS (Hierarchical Replication Strategy), to improve the data access efficiencies. We study our approach and evaluate it through simulation. The results show that our algorithm has improved 12% over the current strategies.Comment: 11 pages, 7 figure
    • …
    corecore