Search CORE

1,639 research outputs found

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository

Cost and Performance-Based Resource Selection Scheme for Asynchronous Replicated System in Utility-Based Computing Environment

Author: Abawajy Jemal H.
Wan Nik Wan Nor Shuhadah
Zhou Bing Bing
Zomaya Albert Y.
Publication venue: 'Insight Society'
Publication date: 22/04/2017
Field of study

A resource selection problem for asynchronous replicated systems in utility-based computing environment is addressed in this paper. The needs for a special attention on this problem lies on the fact that most of the existing replication scheme in this computing system whether implicitly support synchronous replication and/or only consider read-only job. The problem is undoubtedly complex to be solved as two main issues need to be concerned simultaneously, i.e. 1) the difficulty on predicting the performance of the resources in terms of job response time, and 2) an efficient mechanism must be employed in order to measure the trade-off between the performance and the monetary cost incurred on resources so that minimum cost is preserved while providing low job response time. Therefore, a simple yet efficient algorithm that deals with the complexity of resource selection problem in utility-based computing systems is proposed in this paper. The problem is formulated as a Multi Criteria Decision Making (MCDM) problem. The advantages of the algorithm are two-folds. On one fold, it hides the complexity of resource selection process without neglecting important components that affect job response time. The difficulty on estimating job response time is captured by representing them in terms of different QoS criteria levels at each resource. On the other fold, this representation further relaxed the complexity in measuring the trade-offs between the performance and the monetary cost incurred on resources. The experiments proved that our proposed resource selection scheme achieves an appealing result with good system performance and low monetary cost as compared to existing algorithms

International Journal on Advanced Science, Engineering and Information Technology

Data Replication and Its Alignment with Fault Management in the Cloud Environment

Author: Xie Fei
Publication venue: School of Computing and Information Technology
Publication date: 01/01/2021
Field of study

Nowadays, the exponential data growth becomes one of the major challenges all over the world. It may cause a series of negative impacts such as network overloading, high system complexity, and inadequate data security, etc. Cloud computing is developed to construct a novel paradigm to alleviate massive data processing challenges with its on-demand services and distributed architecture. Data replication has been proposed to strategically distribute the data access load to multiple cloud data centres by creating multiple data copies at multiple cloud data centres. A replica-applied cloud environment not only achieves a decrease in response time, an increase in data availability, and more balanced resource load but also protects the cloud environment against the upcoming faults. The reactive fault tolerance strategy is also required to handle the faults when the faults already occurred. As a result, the data replication strategies should be aligned with the reactive fault tolerance strategies to achieve a complete management chain in the cloud environment. In this thesis, a data replication and fault management framework is proposed to establish a decentralised overarching management to the cloud environment. Three data replication strategies are firstly proposed based on this framework. A replica creation strategy is proposed to reduce the total cost by jointly considering the data dependency and the access frequency in the replica creation decision making process. Besides, a cloud map oriented and cost efficiency driven replica creation strategy is proposed to achieve the optimal cost reduction per replica in the cloud environment. The local data relationship and the remote data relationship are further analysed by creating two novel data dependency types, Within-DataCentre Data Dependency and Between-DataCentre Data Dependency, according to the data location. Furthermore, a network performance based replica selection strategy is proposed to avoid potential network overloading problems and to increase the number of concurrent-running instances at the same time

Research Online

Binary vote assignment on grid quorum replication technique with association rule

Author: Ainul Azila Che Fauzi
Publication venue
Publication date: 01/08/2018
Field of study

One of the biggest challenges that data grids users have to face today relates to the improvement of the data management. Organizations need to provide current data to users who may be geographically remote and to handle a volume of requests of data distributed around multiple sites in distributed environment. Therefore, the storage, availability, and consistency are important issues to be addressed to allow efficient and safe data access from many different sites. One way to effectively cope with these challenges is to rely on the replication technique. Replication is a useful technique for distributed database systems. Through this technique, a data can be accessed from multiple locations. Thus, replication increases data availability and accessibility to users. When one site fails, user still can access the same data at another site. Techniques such as Read-One-Write-All (ROWA), Hierarchical Replication Scheme (HRS) and Branch Replication Scheme (BRS) are the popular techniques being used for replication and data management. However, these techniques have its weaknesses in terms of communication costs that is the total replication servers needed to replicate the data. Furthermore, these techniques also do not consider the correlation between data during the fragmentation process. The knowledge about data correlation can be extracted from historical data using techniques of the data mining field. Without proper strategies, replication increases job execution time. In this research, the some-data-to-some-sites scheme called Binary Vote Assignment on Grid Quorum with Association (BV AGQAR) is proposed to manage replication for meaningful fragmented data in distributed database environment with low communication cost and processing time for a transaction. The main feature of BV AGQ-AR is that the technique integrates replication and data mining technique allowing meaningful extraction of knowledge from large data sets. Performance of the BVAGQ-AR technique comprised the following steps. First step is mining the data by using Apriori algorithm from Association Rules. It is used to discover the correlation between data. For the second step, the database is fragmented based on the data mining analysis results. This technique is executed to make sure data replication can be effectively done while saving cost. Then, the databases that are resulted after the fragmentation process are allocated at their assigned sites. Finally, after allocation process, each site has a database file and ready for any transaction and replication process. Finally, the result of the experiments shows that BV AGQ-AR can preserve the data consistency with the lowest communication cost and processing time for a transaction as compared to BCSA, PRA, ROW A, HRS and BRS

UMP Institutional Repository

RPLB: A Replica Placement Algorithm in Data Grid with Load Balancing

Author: Kingsy Rajaretnam
Manimegalai Rajkumar
Ranjith Venkatesan
Publication venue
Publication date: 01/01/2014
Field of study

CiteSeerX

Dynamic replication strategies in data grid systems: A survey

Author: A Chervenak
A Chervenak
A Chervenak
A Chervenak
A Dogan
A Horri
Abdelkader Hameurlain
B Meroufel
B Segal
C Nicholson
D Chen
H Chettaoui
J Ma
JM Pérez
K Sashi
L Adamic
LM Khanli
M Arlitt
M Bsoul
M Lei
M Shorfuzzaman
M Tang
MC Lee
N Mansouri
N Mansouri
N Mansouri
N Saadat
R Mokadem
R Rahman
Riad Mokadem
RS Chang
Sebnem Bora
SM Park
T Amjad
T Loukopoulos
Tolga Ayav
U Cibej
Uras Tos
V Andronikou
X Fu
X Tang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2015
Field of study

In data grid systems, data replication aims to increase availability, fault tolerance, load balancing and scalability while reducing bandwidth consumption, and job execution time. Several classification schemes for data replication were proposed in the literature, (i) static vs. dynamic, (ii) centralized vs. decentralized, (iii) push vs. pull, and (iv) objective function based. Dynamic data replication is a form of data replication that is performed with respect to the changing conditions of the grid environment. In this paper, we present a survey of recent dynamic data replication strategies. We study and classify these strategies by taking the target data grid architecture as the sole classifier. We discuss the key points of the studied strategies and provide feature comparison of them according to important metrics. Furthermore, the impact of data grid architecture on dynamic replication performance is investigated in a simulation study. Finally, some important issues and open research problems in the area are pointed out

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Replica placement in peer-to-peer systems

Author: Wan Awang Wan Suryani
Publication venue
Publication date: 01/01/2016
Field of study

In today’s distributed applications, replica placement is essential since moving the data in the vicinity of an application will provide many benefits. The increasing requirements of data for scientific applications and collaborative access to these data make data placement even more important. Until now, replication is one of the main mechanisms used in distributed data whereby identical copies of data are generated and stored at various distributed sites to improve data access performance and data availability. Most work considers file’s popularity as one of the important parameters taken into consideration when designing replica placement strategies. However, this thesis argues that a combination of popularity and affinity files are the most important parameters which can be used in decision making whilst improving data access performance and data availability in distributed environments. A replica placement mechanism called Affinity Replica Placement Mechanism (ARPM) is proposed focusing on popular files and affinity files. The idea of ARPM is to improve data availability and accessibility in peer-to-peer (P2P) replica placement strategy. A P2P simulator, PeerSim, was used to evaluate the performance of this dynamic replica placement strategy. The simulation results demonstrated the effectiveness of ARPM hence provided a proof that ARPM has contributed towards a new dimension of replica placement strategy that incorporates the affinity and popularity of files replicas in P2P systems

Online Research @ Cardiff

Analysis of simulation environment

Author: Eymann Torsten
Giulioni Gianfranco
Reinicke Michael
Streitberger Werner
Zini Floriano
Publication venue
Publication date
Field of study

In this paper the requirements for an ALN simulation environment are analysed, as needed in the CATNETS Project. A number of grid and general purpose simulators are evaluated regarding the identified requirements for simulating economical resource allocation mechanisms in ALNs. Subsequently a suitable simulator is chosen for usage in the CATNETS project. --CATNETS simulator,requirements analysis,simulator selection

Research Papers in Economics