910 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
MAPREDUCE CHALLENGES ON PERVASIVE GRIDS
International audienceThis study presents the advances on designing and implementing scalable techniques to support the development and execution of MapReduce application in pervasive distributed computing infrastructures, in the context of the PER-MARE project. A pervasive framework for MapReduce applications is very useful in practice, especially in those scientific, enterprises and educational centers which have many unused or underused computing resources, which can be fully exploited to solve relevant problems that demand large computing power, such as scientific computing applications, big data processing, etc. In this study, we pro-pose the study of multiple techniques to support volatility and heterogeneity on MapReduce, by applying two complementary approaches: Improving the Apache Hadoop middleware by including context-awareness and fault-tolerance features; and providing an alternative pervasive grid implementation, fully adapted to dynamic environments. The main design and implementation decisions for both alternatives are described and validated through experiments, demonstrating that our approaches provide high reliability when executing on pervasive environments. The analysis of the experiments also leads to several insights on the requirements and constraints from dynamic and volatile systems, reinforcing the importance of context-aware information and advanced fault-tolerance features to provide efficient and reliable MapReduce services on pervasive grids
Proof-of-Concept Application - Annual Report Year 1
In this document the Cat-COVITE Application for use in the CATNETS Project is introduced and motivated. Furthermore an introduction to the catallactic middleware and Web Services Agreement (WS-Agreement) concepts is given as a basis for the future work. Requirements for the application of Cat-COVITE with in catallactic systems are analysed. Finally the integration of the Cat-COVITE application and the catallactic middleware is described. --Grid Computing
Recommended from our members
Grid-based semantic integration of heterogeneous data resources: Implementation on a HealthGrid
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.The semantic integration of geographically distributed and heterogeneous data
resources still remains a key challenge in Grid infrastructures. Today's
mainstream Grid technologies hold the promise to meet this challenge in a
systematic manner, making data applications more scalable and manageable. The
thesis conducts a thorough investigation of the problem, the state of the art, and
the related technologies, and proposes an Architecture for Semantic Integration of
Data Sources (ASIDS) addressing the semantic heterogeneity issue. It defines a
simple mechanism for the interoperability of heterogeneous data sources in order
to extract or discover information regardless of their different semantics. The
constituent technologies of this architecture include Globus Toolkit (GT4) and
OGSA-DAI (Open Grid Service Architecture Data Integration and Access)
alongside other web services technologies such as XML (Extensive Markup
Language). To show this, the ASIDS architecture was implemented and tested in a
realistic setting by building an exemplar application prototype on a HealthGrid
(pilot implementation).
The study followed an empirical research methodology and was informed by
extensive literature surveys and a critical analysis of the relevant technologies and
their synergies. The two literature reviews, together with the analysis of the
technology background, have provided a good overview of the current Grid and
HealthGrid landscape, produced some valuable taxonomies, explored new paths
by integrating technologies, and more importantly illuminated the problem and
guided the research process towards a promising solution. Yet the primary
contribution of this research is an approach that uses contemporary Grid
technologies for integrating heterogeneous data resources that have semantically
different. data fields (attributes). It has been practically demonstrated (using a
prototype HealthGrid) that discovery in semantically integrated distributed data
sources can be feasible by using mainstream Grid technologies, which have been
shown to have some Significant advantages over non-Grid based approaches
A customizable multi-agent system for distributed data mining
We present a general Multi-Agent System framework for
distributed data mining based on a Peer-to-Peer model. Agent
protocols are implemented through message-based asynchronous
communication. The framework adopts a dynamic load balancing
policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances
Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies
Grid is an infrastructure that involves the integrated and collaborative use
of computers, networks, databases and scientific instruments owned and managed
by multiple organizations. Grid applications often involve large amounts of
data and/or computing resources that require secure resource sharing across
organizational boundaries. This makes Grid application management and
deployment a complex undertaking. Grid middlewares provide users with seamless
computing ability and uniform access to resources in the heterogeneous Grid
environment. Several software toolkits and systems have been developed, most of
which are results of academic research projects, all over the world. This
chapter will focus on four of these middlewares--UNICORE, Globus, Legion and
Gridbus. It also presents our implementation of a resource broker for UNICORE
as this functionality was not supported in it. A comparison of these systems on
the basis of the architecture, implementation model and several other features
is included.Comment: 19 pages, 10 figure
- …