3 research outputs found

    Towards efficient localization of dynamic replicas for Geo-Distributed data stores

    Get PDF
    Large-scale scientific experiments increasingly rely on geo- distributed clouds to serve relevant data to scientists world- wide with minimal latency. State-of-the-art caching systems often require the client to access the data through a caching proxy, or to contact a metadata server to locate the closest available copy of the desired data. Also, such caching sys- tems are inconsistent with the design of distributed hash- table databases such as Dynamo, which focus on allowing clients to locate data independently. We argue there is a gap between existing state-of-the-art solutions and the needs of geographically distributed applications, which require fast access to popular objects while not degrading access latency for the rest of the data. In this paper, we introduce a proba- bilistic algorithm allowing the user to locate the closest copy of the data e?ciently and independently with minimal over- head, allowing low-latency access to non-cached data. Also, we propose a network-e?cient technique to identify the most popular data objects in the cluster and trigger their replica- tion close to the clients. Experiments with a real-world data set show that these principles allow clients to locate the clos- est available copy of data with small memory footprint and low error-rate, thus improving read-latency for non-cached data and allowing hot data to be read locally

    GEO-REPLICATION IN A REVIEW OF LATENCY AND COST-EFFECTIVENESS

    Get PDF
    Replication is a data distribution technique for synchronization between databases so that data remains consistent. Replication can overcome data loss problems and perform system recovery quickly if a problem occurs on one of the servers. One of the problems is when a natural disaster occurs at the server location. As a result, if you do not have data replication in different locations, it will cause the system to not run and possibly lose data. Then, geo-replication can reduce latency because the distance between the client and the data center is much closer. The application of geo-replication in general replicates data in all data centers. As a result, the cost of implementation is high because it requires a lot of resources. Because of the various advantages and disadvantages in its application, it is necessary to group geo-replication techniques to make it easier for researchers and technicians to adjust as needed. Therefore, this paper surveys the articles on Geo-replication techniques to implement cost-effectiveness and latency. The articles surveyed included a method for selecting replication sites, a method for reducing round trip time, a method according to data type, and selecting a leader to determine which server node to use. The results of the article survey show that implementing geo-replication for cost-effectiveness is more suitable for use in systems where all users do not need to access all data. Meanwhile, low latency is more suitable for systems used by various types of users. This paper can utilize the techniques that have been reviewed to overcome the problem of cost-effectiveness and latency in implementing Geo-replication

    An enhanced dynamic replica creation and eviction mechanism in data grid federation environment

    Get PDF
    Data Grid Federation system is an infrastructure that connects several grid systems, which facilitates sharing of large amount of data, as well as storage and computing resources. The existing mechanisms on data replication focus on finding file values based on the number of files access in deciding which file to replicate, and place new replicas on locations that provide minimum read cost. DRCEM finds file values based on logical dependencies in deciding which file to replicate, and allocates new replicas on locations that provide minimum replica placement cost. This thesis presents an enhanced data replication strategy known as Dynamic Replica Creation and Eviction Mechanism (DRCEM) that utilizes the usage of data grid resources, by allocating appropriate replica sites around the federation. The proposed mechanism uses three schemes: 1) Dynamic Replica Evaluation and Creation Scheme, 2) Replica Placement Scheme, and 3) Dynamic Replica Eviction Scheme. DRCEM was evaluated using OptorSim network simulator based on four performance metrics: 1) Jobs Completion Times, 2) Effective Network Usage, 3) Storage Element Usage, and 4) Computing Element Usage. DRCEM outperforms ELALW and DRCM mechanisms by 30% and 26%, in terms of Jobs Completion Times. In addition, DRCEM consumes less storage compared to ELALW and DRCM by 42% and 40%. However, DRCEM shows lower performance compared to existing mechanisms regarding Computing Element Usage, due to additional computations of files logical dependencies. Results revealed better jobs completion times with lower resource consumption than existing approaches. This research produces three replication schemes embodied in one mechanism that enhances the performance of Data Grid Federation environment. This has contributed to the enhancement of the existing mechanism, which is capable of deciding to either create or evict more than one file during a particular time. Furthermore, files logical dependencies were integrated into the replica creation scheme to evaluate data files more accurately
    corecore