461 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies
Grid is an infrastructure that involves the integrated and collaborative use
of computers, networks, databases and scientific instruments owned and managed
by multiple organizations. Grid applications often involve large amounts of
data and/or computing resources that require secure resource sharing across
organizational boundaries. This makes Grid application management and
deployment a complex undertaking. Grid middlewares provide users with seamless
computing ability and uniform access to resources in the heterogeneous Grid
environment. Several software toolkits and systems have been developed, most of
which are results of academic research projects, all over the world. This
chapter will focus on four of these middlewares--UNICORE, Globus, Legion and
Gridbus. It also presents our implementation of a resource broker for UNICORE
as this functionality was not supported in it. A comparison of these systems on
the basis of the architecture, implementation model and several other features
is included.Comment: 19 pages, 10 figure
Mass Storage Management and the Grid
The University of Edinburgh has a significant interest in mass storage
systems as it is one of the core groups tasked with the roll out of storage
software for the UK's particle physics grid, GridPP. We present the results of
a development project to provide software interfaces between the SDSC Storage
Resource Broker, the EU DataGrid and the Storage Resource Manager. This project
was undertaken in association with the eDikt group at the National eScience
Centre, the Universities of Bristol and Glasgow, Rutherford Appleton Laboratory
and the San Diego Supercomputing Center.Comment: 4 pages, 3 figures, Presented at Computing for High Energy and
Nuclear Physics 2004 (CHEP '04), Interlaken, Switzerland, September 200
Extended Resource Specification Language Reference Manual for ARC versions 0.8 and above
Extended Resource Specification Language Reference Manual for ARC versions 0.8 and abov
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
The Lattice Project: A Multi-model Grid Computing System
This thesis presents The Lattice Project, a system that combines multiple models of Grid computing. Grid computing is a paradigm for leveraging multiple distributed computational resources to solve fundamental scientific problems that require large amounts of computation. The system combines the traditional Service model of Grid computing with the Desktop model of Grid computing, and is thus capable of utilizing diverse resources such as institutional desktop computers, dedicated computing clusters, and machines volunteered by the general public to advance science. The production Grid system includes a fully-featured user interface, support for a large number of popular scientific applications, a robust Grid-level scheduler, and novel enhancements such as a Grid-wide file caching scheme. A substantial amount of scientific research has already been completed using The Lattice Project
Grid Enabled Geospatial Catalogue Web Service
Geospatial Catalogue Web Service is a vital service for sharing and interoperating volumes of distributed heterogeneous geospatial resources, such as data, services, applications, and their replicas over the web. Based on the Grid technology and the Open Geospatial Consortium (0GC) s Catalogue Service - Web Information Model, this paper proposes a new information model for Geospatial Catalogue Web Service, named as GCWS which can securely provides Grid-based publishing, managing and querying geospatial data and services, and the transparent access to the replica data and related services under the Grid environment. This information model integrates the information model of the Grid Replica Location Service (RLS)/Monitoring & Discovery Service (MDS) with the information model of OGC Catalogue Service (CSW), and refers to the geospatial data metadata standards from IS0 19115, FGDC and NASA EOS Core System and service metadata standards from IS0 191 19 to extend itself for expressing geospatial resources. Using GCWS, any valid geospatial user, who belongs to an authorized Virtual Organization (VO), can securely publish and manage geospatial resources, especially query on-demand data in the virtual community and get back it through the data-related services which provide functions such as subsetting, reformatting, reprojection etc. This work facilitates the geospatial resources sharing and interoperating under the Grid environment, and implements geospatial resources Grid enabled and Grid technologies geospatial enabled. It 2!so makes researcher to focus on science, 2nd not cn issues with computing ability, data locztic~, processir,g and management. GCWS also is a key component for workflow-based virtual geospatial data producing
Grid-Brick Event Processing Framework in GEPS
Experiments like ATLAS at LHC involve a scale of computing and data
management that greatly exceeds the capability of existing systems, making it
necessary to resort to Grid-based Parallel Event Processing Systems (GEPS).
Traditional Grid systems concentrate the data in central data servers which
have to be accessed by many nodes each time an analysis or processing job
starts. These systems require very powerful central data servers and make
little use of the distributed disk space that is available in commodity
computers. The Grid-Brick system, which is described in this paper, follows a
different approach. The data storage is split among all grid nodes having each
one a piece of the whole information. Users submit queries and the system will
distribute the tasks through all the nodes and retrieve the result, merging
them together in the Job Submit Server. The main advantage of using this system
is the huge scalability it provides, while its biggest disadvantage appears in
the case of failure of one of the nodes. A workaround for this problem involves
data replication or backup.Comment: 6 pages; document for CHEP'03 conferenc
Accelerating Large-scale Data Exploration through Data Diffusion
Data-intensive applications often require exploratory analysis of large
datasets. If analysis is performed on distributed resources, data locality can
be crucial to high throughput and performance. We propose a "data diffusion"
approach that acquires compute and storage resources dynamically, replicates
data in response to demand, and schedules computations close to data. As demand
increases, more resources are acquired, thus allowing faster response to
subsequent requests that refer to the same data; when demand drops, resources
are released. This approach can provide the benefits of dedicated hardware
without the associated high costs, depending on workload and resource
characteristics. The approach is reminiscent of cooperative caching,
web-caching, and peer-to-peer storage systems, but addresses different
application demands. Other data-aware scheduling approaches assume dedicated
resources, which can be expensive and/or inefficient if load varies
significantly. To explore the feasibility of the data diffusion approach, we
have extended the Falkon resource provisioning and task scheduling system to
support data caching and data-aware scheduling. Performance results from both
micro-benchmarks and a large scale astronomy application demonstrate that our
approach improves performance relative to alternative approaches, as well as
provides improved scalability as aggregated I/O bandwidth scales linearly with
the number of data cache nodes.Comment: IEEE/ACM International Workshop on Data-Aware Distributed Computing
200
- …