716 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Next-Generation EU DataGrid Data Management Services
We describe the architecture and initial implementation of the
next-generation of Grid Data Management Middleware in the EU DataGrid (EDG)
project.
The new architecture stems out of our experience and the users requirements
gathered during the two years of running our initial set of Grid Data
Management Services. All of our new services are based on the Web Service
technology paradigm, very much in line with the emerging Open Grid Services
Architecture (OGSA). We have modularized our components and invested a great
amount of effort towards a secure, extensible and robust service, starting from
the design but also using a streamlined build and testing framework.
Our service components are: Replica Location Service, Replica Metadata
Service, Replica Optimization Service, Replica Subscription and high-level
replica management. The service security infrastructure is fully GSI-enabled,
hence compatible with the existing Globus Toolkit 2-based services; moreover,
it allows for fine-grained authorization mechanisms that can be adjusted
depending on the service semantics.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla,Ca, USA, March 2003 8 pages, LaTeX, the file contains all
LaTeX sources - figures are in the directory "figures
Replica Creation Algorithm for Data Grids
Data grid system is a data management infrastructure that facilitates reliable access and sharing of large amount of data, storage resources, and data transfer services that can be scaled across distributed locations. This thesis presents a new replication algorithm that improves data access performance in data grids by distributing relevant data copies around the grid. The new Data Replica Creation Algorithm (DRCM) improves performance of data grid systems by reducing job execution time and making the best use of data grid resources (network bandwidth and storage space). Current algorithms focus on number of accesses in deciding which file to replicate and where to place them, which ignores resources’ capabilities. DRCM differs by considering both user and resource perspectives; strategically placing replicas at locations that provide the lowest transfer cost. The proposed algorithm uses three strategies: Replica Creation and Deletion Strategy (RCDS), Replica Placement Strategy (RPS), and Replica Replacement Strategy (RRS). DRCM was evaluated using network simulation (OptorSim) based on selected performance metrics (mean job execution time, efficient network usage, average storage usage, and computing element usage), scenarios, and topologies. Results revealed better job execution time with lower resource consumption than existing approaches. This research contributes replication strategies embodied in one algorithm that enhances data grid performance, capable of making a decision on creating or deleting more than one file during same decision. Furthermore, dependency-level-between-files criterion was utilized and integrated with the exponential growth/decay model to give an accurate file evaluation
Fault Tolerant Resource Allocation for Query Processing in Grid Environments
International audienceIn this paper, we propose a new algorithm for fault-tolerant resource allocation for query processing in grid environments. For this, we propose an initial resource allocation algorithm followed by a fault-tolerance protocol. The proposed fault-tolerance protocol is based on the passive replication of stateful operators in queries. We provide theoretical analyses of the proposed algorithms and consolidate our analyses with the simulations
A dynamic replica creation: Which file to replicate?
Data Grid is an infrastructure that manages huge amount of data files and provides intensive computational resources across geographically distributed collaboration.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.Data replication is one of the methods used to improve the performance of data access in distributed systems.In this paper, we propose a dynamic replication strategy that is based on exponential growth or decay rate and dependency level of data files (EXPM).Simulation results (via Optorsim) show that EXPM outperformed LALW in
the measured metrics – mean job execution time, effective network usage and average storage usage
An Approach to Ad hoc Cloud Computing
We consider how underused computing resources within an enterprise may be
harnessed to improve utilization and create an elastic computing
infrastructure. Most current cloud provision involves a data center model, in
which clusters of machines are dedicated to running cloud infrastructure
software. We propose an additional model, the ad hoc cloud, in which
infrastructure software is distributed over resources harvested from machines
already in existence within an enterprise. In contrast to the data center cloud
model, resource levels are not established a priori, nor are resources
dedicated exclusively to the cloud while in use. A participating machine is not
dedicated to the cloud, but has some other primary purpose such as running
interactive processes for a particular user. We outline the major
implementation challenges and one approach to tackling them
Economy-based data replication broker
Data replication is one of the key components in data grid architecture as it enhances data access and reliability and minimises the cost of data transmission. In this paper, we address the problem of reducing the overheads of the replication mechanisms that drive the data management components of a data grid. We propose an approach that extends the resource broker with policies that factor in user quality of service as well as service costs when replicating and transferring data. A realistic model of the data grid was created to simulate and explore the performance of the proposed policy. The policy displayed an effective means of improving the performance of the grid network traffic and is indicated by the improvement of speed and cost of transfers by brokers.<br /
- …