Search CORE

461 research outputs found

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository

Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies

Author: Asadzadeh Parvin
Buyya Rajkumar
Kei Chun Ling
Nayar Deepa
Venugopal Srikumar
Publication venue
Publication date: 01/07/2004
Field of study

Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries. This makes Grid application management and deployment a complex undertaking. Grid middlewares provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment. Several software toolkits and systems have been developed, most of which are results of academic research projects, all over the world. This chapter will focus on four of these middlewares--UNICORE, Globus, Legion and Gridbus. It also presents our implementation of a resource broker for UNICORE as this functionality was not supported in it. A comparison of these systems on the basis of the architecture, implementation model and several other features is included.Comment: 19 pages, 10 figure

arXiv.org e-Print Archive

Enlighten

Mass Storage Management and the Grid

Author: Clark P.
Earl A.
Publication venue
Publication date: 20/12/2004
Field of study

The University of Edinburgh has a significant interest in mass storage systems as it is one of the core groups tasked with the roll out of storage software for the UK's particle physics grid, GridPP. We present the results of a development project to provide software interfaces between the SDSC Storage Resource Broker, the EU DataGrid and the Storage Resource Manager. This project was undertaken in association with the eDikt group at the National eScience Centre, the Universities of Bristol and Glasgow, Rutherford Appleton Laboratory and the San Diego Supercomputing Center.Comment: 4 pages, 3 figures, Presented at Computing for High Energy and Nuclear Physics 2004 (CHEP '04), Interlaken, Switzerland, September 200

arXiv.org e-Print Archive

CERN Document Server

Extended Resource Specification Language Reference Manual for ARC versions 0.8 and above

Author: Ould-Saada Farid
Publication venue
Publication date: 28/05/2013
Field of study

Extended Resource Specification Language Reference Manual for ARC versions 0.8 and abov

ZENODO

A Taxonomy of Workflow Management Systems for Grid Computing

Author: Buyya Rajkumar
Yu Jia
Publication venue
Publication date: 01/01/2005
Field of study

With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

The Lattice Project: A Multi-model Grid Computing System

Author: Bazinet Adam Lee
Publication venue
Publication date: 01/01/2009
Field of study

This thesis presents The Lattice Project, a system that combines multiple models of Grid computing. Grid computing is a paradigm for leveraging multiple distributed computational resources to solve fundamental scientific problems that require large amounts of computation. The system combines the traditional Service model of Grid computing with the Desktop model of Grid computing, and is thus capable of utilizing diverse resources such as institutional desktop computers, dedicated computing clusters, and machines volunteered by the general public to advance science. The production Grid system includes a fully-featured user interface, support for a large number of popular scientific applications, a robust Grid-level scheduler, and novel enhancements such as a Grid-wide file caching scheme. A substantial amount of scientific research has already been completed using The Lattice Project

Digital Repository at the University of Maryland

Grid Enabled Geospatial Catalogue Web Service

Author: Bui Yu-Qi
Chen Ai-Jun
Di Li-Ping
Hu Chau-Min
Liu Yang
Mehrotra Piyush
Wei Ya-Xing
Publication venue
Publication date: 30/11/2004
Field of study

Geospatial Catalogue Web Service is a vital service for sharing and interoperating volumes of distributed heterogeneous geospatial resources, such as data, services, applications, and their replicas over the web. Based on the Grid technology and the Open Geospatial Consortium (0GC) s Catalogue Service - Web Information Model, this paper proposes a new information model for Geospatial Catalogue Web Service, named as GCWS which can securely provides Grid-based publishing, managing and querying geospatial data and services, and the transparent access to the replica data and related services under the Grid environment. This information model integrates the information model of the Grid Replica Location Service (RLS)/Monitoring & Discovery Service (MDS) with the information model of OGC Catalogue Service (CSW), and refers to the geospatial data metadata standards from IS0 19115, FGDC and NASA EOS Core System and service metadata standards from IS0 191 19 to extend itself for expressing geospatial resources. Using GCWS, any valid geospatial user, who belongs to an authorized Virtual Organization (VO), can securely publish and manage geospatial resources, especially query on-demand data in the virtual community and get back it through the data-related services which provide functions such as subsetting, reformatting, reprojection etc. This work facilitates the geospatial resources sharing and interoperating under the Grid environment, and implements geospatial resources Grid enabled and Grid technologies geospatial enabled. It 2!so makes researcher to focus on science, 2nd not cn issues with computing ability, data locztic~, processir,g and management. GCWS also is a key component for workflow-based virtual geospatial data producing

NASA Technical Reports Server

Grid-Brick Event Processing Framework in GEPS

Author: Almeida Nuno
Amorim Antonio
Fei Han
Pedro Luis
Trezentos Paulo
Villate Jaime E.
Publication venue
Publication date: 14/06/2003
Field of study

Experiments like ATLAS at LHC involve a scale of computing and data management that greatly exceeds the capability of existing systems, making it necessary to resort to Grid-based Parallel Event Processing Systems (GEPS). Traditional Grid systems concentrate the data in central data servers which have to be accessed by many nodes each time an analysis or processing job starts. These systems require very powerful central data servers and make little use of the distributed disk space that is available in commodity computers. The Grid-Brick system, which is described in this paper, follows a different approach. The data storage is split among all grid nodes having each one a piece of the whole information. Users submit queries and the system will distribute the tasks through all the nodes and retrieve the result, merging them together in the Job Submit Server. The main advantage of using this system is the huge scalability it provides, while its biggest disadvantage appears in the case of failure of one of the nodes. A workaround for this problem involves data replication or backup.Comment: 6 pages; document for CHEP'03 conferenc

arXiv.org e-Print Archive

CERN Document Server

Accelerating Large-scale Data Exploration through Data Diffusion

Author: Foster Ian
Raicu Ioan
Szalay Alex
Zhao Yong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.Comment: IEEE/ACM International Workshop on Data-Aware Distributed Computing 200

arXiv.org e-Print Archive

Crossref