Search CORE

716 research outputs found

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

Next-Generation EU DataGrid Data Management Services

Author: Akos Frohner
David Cameron
Diana Bosio
Domenici A
Erwin Laure
Federico Dicarlo
Floriano Zini
Gavin Mccance
Gianluca Volpato
Heinz Stockinger
Joni Hahkala
Kurt Stockinger
Leanne Guy
Levi Lucio
Livio Salconi
Mika Sil
Niklas Karlsson
Olle Mulmo
Paul Millar
Peter Kunszt
Ruben Carvajal-Schiaffino
Sophie Lemaitre
Ville Nenonen
William Bell
Publication venue
Publication date: 01/01/2003
Field of study

We describe the architecture and initial implementation of the next-generation of Grid Data Management Middleware in the EU DataGrid (EDG) project. The new architecture stems out of our experience and the users requirements gathered during the two years of running our initial set of Grid Data Management Services. All of our new services are based on the Web Service technology paradigm, very much in line with the emerging Open Grid Services Architecture (OGSA). We have modularized our components and invested a great amount of effort towards a secure, extensible and robust service, starting from the design but also using a streamlined build and testing framework. Our service components are: Replica Location Service, Replica Metadata Service, Replica Optimization Service, Replica Subscription and high-level replica management. The service security infrastructure is fully GSI-enabled, hence compatible with the existing Globus Toolkit 2-based services; moreover, it allows for fine-grained authorization mechanisms that can be adjusted depending on the service semantics.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla,Ca, USA, March 2003 8 pages, LaTeX, the file contains all LaTeX sources - figures are in the directory "figures

arXiv.org e-Print Archive

CiteSeerX

Replica Creation Algorithm for Data Grids

Author: Madi Mohammed Kamel
Publication venue
Publication date: 01/01/2012
Field of study

Data grid system is a data management infrastructure that facilitates reliable access and sharing of large amount of data, storage resources, and data transfer services that can be scaled across distributed locations. This thesis presents a new replication algorithm that improves data access performance in data grids by distributing relevant data copies around the grid. The new Data Replica Creation Algorithm (DRCM) improves performance of data grid systems by reducing job execution time and making the best use of data grid resources (network bandwidth and storage space). Current algorithms focus on number of accesses in deciding which file to replicate and where to place them, which ignores resources’ capabilities. DRCM differs by considering both user and resource perspectives; strategically placing replicas at locations that provide the lowest transfer cost. The proposed algorithm uses three strategies: Replica Creation and Deletion Strategy (RCDS), Replica Placement Strategy (RPS), and Replica Replacement Strategy (RRS). DRCM was evaluated using network simulation (OptorSim) based on selected performance metrics (mean job execution time, efficient network usage, average storage usage, and computing element usage), scenarios, and topologies. Results revealed better job execution time with lower resource consumption than existing approaches. This research contributes replication strategies embodied in one algorithm that enhances data grid performance, capable of making a decision on creating or deleting more than one file during same decision. Furthermore, dependency-level-between-files criterion was utilized and integrated with the exponential growth/decay model to give an accurate file evaluation

Universiti Utara Malaysia: UUM eTheses

Fault Tolerant Resource Allocation for Query Processing in Grid Environments

Author: Cokuslu Deniz
Erciyes Kayhan
Hameurlain Abdelkader
Publication venue: 'Inderscience Publishers'
Publication date: 01/04/2015
Field of study

International audienceIn this paper, we propose a new algorithm for fault-tolerant resource allocation for query processing in grid environments. For this, we propose an initial resource allocation algorithm followed by a fault-tolerance protocol. The proposed fault-tolerance protocol is based on the passive replication of stateful operators in queries. We provide theoretical analyses of the proposed algorithms and consolidate our analyses with the simulations

A dynamic replica creation: Which file to replicate?

Author: Hassan Suhaidi
Madi Mohammed
Yusof Yuhanis
Publication venue
Publication date: 01/06/2011
Field of study

Data Grid is an infrastructure that manages huge amount of data files and provides intensive computational resources across geographically distributed collaboration.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.Data replication is one of the methods used to improve the performance of data access in distributed systems.In this paper, we propose a dynamic replication strategy that is based on exponential growth or decay rate and dependency level of data files (EXPM).Simulation results (via Optorsim) show that EXPM outperformed LALW in the measured metrics – mean job execution time, effective network usage and average storage usage

An Approach to Ad hoc Cloud Computing

Author: Dearle Alan
Fernandes Alvaro
Kirby Graham
Macdonald Angus
Publication venue
Publication date: 01/01/2010
Field of study

We consider how underused computing resources within an enterprise may be harnessed to improve utilization and create an elastic computing infrastructure. Most current cloud provision involves a data center model, in which clusters of machines are dedicated to running cloud infrastructure software. We propose an additional model, the ad hoc cloud, in which infrastructure software is distributed over resources harvested from machines already in existence within an enterprise. In contrast to the data center cloud model, resource levels are not established a priori, nor are resources dedicated exclusively to the cloud while in use. A participating machine is not dedicated to the cloud, but has some other primary purpose such as running interactive processes for a particular user. We outline the major implementation challenges and one approach to tackling them

arXiv.org e-Print Archive

CiteSeerX

Economy-based data replication broker

Author: Abawajy Jemal
Buyya Rajkumar
Lin Henry
Publication venue: Institute of Electrical and Electronics Engineers CS Press
Publication date: 01/01/2006
Field of study

Data replication is one of the key components in data grid architecture as it enhances data access and reliability and minimises the cost of data transmission. In this paper, we address the problem of reducing the overheads of the replication mechanisms that drive the data management components of a data grid. We propose an approach that extends the resource broker with policies that factor in user quality of service as well as service costs when replicating and transferring data. A realistic model of the data grid was created to simulate and explore the performance of the proposed policy. The policy displayed an effective means of improving the performance of the grid network traffic and is indicated by the improvement of speed and cost of transfers by brokers.<br /