1,749 research outputs found

    An enhanced dynamic replica creation and eviction mechanism in data grid federation environment

    Get PDF
    Data Grid Federation system is an infrastructure that connects several grid systems, which facilitates sharing of large amount of data, as well as storage and computing resources. The existing mechanisms on data replication focus on finding file values based on the number of files access in deciding which file to replicate, and place new replicas on locations that provide minimum read cost. DRCEM finds file values based on logical dependencies in deciding which file to replicate, and allocates new replicas on locations that provide minimum replica placement cost. This thesis presents an enhanced data replication strategy known as Dynamic Replica Creation and Eviction Mechanism (DRCEM) that utilizes the usage of data grid resources, by allocating appropriate replica sites around the federation. The proposed mechanism uses three schemes: 1) Dynamic Replica Evaluation and Creation Scheme, 2) Replica Placement Scheme, and 3) Dynamic Replica Eviction Scheme. DRCEM was evaluated using OptorSim network simulator based on four performance metrics: 1) Jobs Completion Times, 2) Effective Network Usage, 3) Storage Element Usage, and 4) Computing Element Usage. DRCEM outperforms ELALW and DRCM mechanisms by 30% and 26%, in terms of Jobs Completion Times. In addition, DRCEM consumes less storage compared to ELALW and DRCM by 42% and 40%. However, DRCEM shows lower performance compared to existing mechanisms regarding Computing Element Usage, due to additional computations of files logical dependencies. Results revealed better jobs completion times with lower resource consumption than existing approaches. This research produces three replication schemes embodied in one mechanism that enhances the performance of Data Grid Federation environment. This has contributed to the enhancement of the existing mechanism, which is capable of deciding to either create or evict more than one file during a particular time. Furthermore, files logical dependencies were integrated into the replica creation scheme to evaluate data files more accurately

    Cost and Performance-Based Resource Selection Scheme for Asynchronous Replicated System in Utility-Based Computing Environment

    Get PDF
    A resource selection problem for asynchronous replicated systems in utility-based computing environment is addressed in this paper. The needs for a special attention on this problem lies on the fact that most of the existing replication scheme in this computing system whether implicitly support synchronous replication and/or only consider read-only job. The problem is undoubtedly complex to be solved as two main issues need to be concerned simultaneously, i.e. 1) the difficulty on predicting the performance of the resources in terms of job response time, and 2) an efficient mechanism must be employed in order to measure the trade-off between the performance and the monetary cost incurred on resources so that minimum cost is preserved while providing low job response time. Therefore, a simple yet efficient algorithm that deals with the complexity of resource selection problem in utility-based computing systems is proposed in this paper. The problem is formulated as a Multi Criteria Decision Making (MCDM) problem. The advantages of the algorithm are two-folds. On one fold, it hides the complexity of resource selection process without neglecting important components that affect job response time. The difficulty on estimating job response time is captured by representing them in terms of different QoS criteria levels at each resource. On the other fold, this representation further relaxed the complexity in measuring the trade-offs between the performance and the monetary cost incurred on resources. The experiments proved that our proposed resource selection scheme achieves an appealing result with good system performance and low monetary cost as compared to existing algorithms

    Data Replication and Its Alignment with Fault Management in the Cloud Environment

    Get PDF
    Nowadays, the exponential data growth becomes one of the major challenges all over the world. It may cause a series of negative impacts such as network overloading, high system complexity, and inadequate data security, etc. Cloud computing is developed to construct a novel paradigm to alleviate massive data processing challenges with its on-demand services and distributed architecture. Data replication has been proposed to strategically distribute the data access load to multiple cloud data centres by creating multiple data copies at multiple cloud data centres. A replica-applied cloud environment not only achieves a decrease in response time, an increase in data availability, and more balanced resource load but also protects the cloud environment against the upcoming faults. The reactive fault tolerance strategy is also required to handle the faults when the faults already occurred. As a result, the data replication strategies should be aligned with the reactive fault tolerance strategies to achieve a complete management chain in the cloud environment. In this thesis, a data replication and fault management framework is proposed to establish a decentralised overarching management to the cloud environment. Three data replication strategies are firstly proposed based on this framework. A replica creation strategy is proposed to reduce the total cost by jointly considering the data dependency and the access frequency in the replica creation decision making process. Besides, a cloud map oriented and cost efficiency driven replica creation strategy is proposed to achieve the optimal cost reduction per replica in the cloud environment. The local data relationship and the remote data relationship are further analysed by creating two novel data dependency types, Within-DataCentre Data Dependency and Between-DataCentre Data Dependency, according to the data location. Furthermore, a network performance based replica selection strategy is proposed to avoid potential network overloading problems and to increase the number of concurrent-running instances at the same time

    Binary vote assignment on grid quorum replication technique with association rule

    Get PDF
    One of the biggest challenges that data grids users have to face today relates to the improvement of the data management. Organizations need to provide current data to users who may be geographically remote and to handle a volume of requests of data distributed around multiple sites in distributed environment. Therefore, the storage, availability, and consistency are important issues to be addressed to allow efficient and safe data access from many different sites. One way to effectively cope with these challenges is to rely on the replication technique. Replication is a useful technique for distributed database systems. Through this technique, a data can be accessed from multiple locations. Thus, replication increases data availability and accessibility to users. When one site fails, user still can access the same data at another site. Techniques such as Read-One-Write-All (ROWA), Hierarchical Replication Scheme (HRS) and Branch Replication Scheme (BRS) are the popular techniques being used for replication and data management. However, these techniques have its weaknesses in terms of communication costs that is the total replication servers needed to replicate the data. Furthermore, these techniques also do not consider the correlation between data during the fragmentation process. The knowledge about data correlation can be extracted from historical data using techniques of the data mining field. Without proper strategies, replication increases job execution time. In this research, the some-data-to-some-sites scheme called Binary Vote Assignment on Grid Quorum with Association (BV AGQAR) is proposed to manage replication for meaningful fragmented data in distributed database environment with low communication cost and processing time for a transaction. The main feature of BV AGQ-AR is that the technique integrates replication and data mining technique allowing meaningful extraction of knowledge from large data sets. Performance of the BVAGQ-AR technique comprised the following steps. First step is mining the data by using Apriori algorithm from Association Rules. It is used to discover the correlation between data. For the second step, the database is fragmented based on the data mining analysis results. This technique is executed to make sure data replication can be effectively done while saving cost. Then, the databases that are resulted after the fragmentation process are allocated at their assigned sites. Finally, after allocation process, each site has a database file and ready for any transaction and replication process. Finally, the result of the experiments shows that BV AGQ-AR can preserve the data consistency with the lowest communication cost and processing time for a transaction as compared to BCSA, PRA, ROW A, HRS and BRS

    Replica placement in peer-to-peer systems

    Get PDF
    In today’s distributed applications, replica placement is essential since moving the data in the vicinity of an application will provide many benefits. The increasing requirements of data for scientific applications and collaborative access to these data make data placement even more important. Until now, replication is one of the main mechanisms used in distributed data whereby identical copies of data are generated and stored at various distributed sites to improve data access performance and data availability. Most work considers file’s popularity as one of the important parameters taken into consideration when designing replica placement strategies. However, this thesis argues that a combination of popularity and affinity files are the most important parameters which can be used in decision making whilst improving data access performance and data availability in distributed environments. A replica placement mechanism called Affinity Replica Placement Mechanism (ARPM) is proposed focusing on popular files and affinity files. The idea of ARPM is to improve data availability and accessibility in peer-to-peer (P2P) replica placement strategy. A P2P simulator, PeerSim, was used to evaluate the performance of this dynamic replica placement strategy. The simulation results demonstrated the effectiveness of ARPM hence provided a proof that ARPM has contributed towards a new dimension of replica placement strategy that incorporates the affinity and popularity of files replicas in P2P systems

    Data Replication Strategies with Performance Objective in Data Grid Systems: A Survey

    Get PDF
    Replicating for performance constitutes an important issue in large-scale data management systems. In this context, a significant number of replication strategies have been proposed for data grid systems. Some works classified these strategies into static vs. dynamic or centralised vs. decentralised or client vs. server initiated strategies. Very few works deal with a replication strategy classification based on the role of these strategies when building a replica management system. In this paper, we propose a new replication strategy classification based on objective functions of these strategies. Also, each replication strategy is designed according to the data grid topology for which it was proposed. We point out the impact of the topology on replication performance although most of these strategies have been proposed for a hierarchical grid topology. We also study the impact of some factors on performance of these strategies, e.g. access pattern, bandwidth consumption and storage capacity

    Relationship based replication algorithm for data grid

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files and provides intensive computational resources across geographically distributed systems.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.This research proposes a replication algorithm, termed as Relationship based Replication (RBR) that integrates users, grid and system perspective.In particular, the RBR includes information of three different relationships in identifying file(s) that requires replication; file-to-user, file-to-file and file-to-grid. Such an approach overcomes existing algorithms that is based either on users request or resource capabilities as an individual. The Relationship based Replication algorithm aims to improve the Data Grid performance by reducing the job execution time, bandwidth and storage usage.The RBR was realized using a network simulation (OptorSim) and experiment results revealed that it offers better performance than existing replication algorithms
    • …
    corecore