50 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Data as a Service (DaaS) for sharing and processing of large data collections in the cloud
Data as a Service (DaaS) is among the latest kind of services being investigated in the Cloud computing community. The main aim of DaaS is to overcome limitations of state-of-the-art approaches in data technologies, according to which data is stored and accessed from repositories whose location is known and is relevant for sharing and processing. Besides limitations for the data sharing, current approaches also do not achieve to fully separate/decouple software services from data and thus impose limitations in inter-operability. In this paper we propose a DaaS approach for intelligent sharing and processing of large data collections with the aim of abstracting the data location (by making it relevant to the needs of sharing and accessing) and to fully decouple the data and its processing. The aim of our approach is to build a Cloud computing platform, offering DaaS to support large communities of users that need to share, access, and process the data for collectively building knowledge from data. We exemplify the approach from large data collections from health and biology domains.Peer ReviewedPostprint (author's final draft
Grid-based Search Technique for Massive Academic Publications
The numerical size of academic publications that are being published in
recent years had grown rapidly. Accessing and searching massive academic
publications that are distributed over several locations need large amount of
computing resources to increase the system performance. Therefore, many
grid-based search techniques were proposed to provide flexible methods for
searching the distributed extensive data. This paper proposes search technique
that is capable of searching the extensive publications by utilizing grid
computing technology. The search technique is implemented as interconnected
grid services to offer a mechanism to access different data locations. The
experimental result shows that the grid-based search technique has enhanced the
performance of the search.Comment: 4 pages, 5 figures, conference. The 2014 Third ICT International
Student Project Conference (ICT-ISPC2014
Simulation of Diagonal Data Replication in Mesh
In a large dynamic network, data can be copied anywhere to make it fault tolerant and easy accessed but there must be an efficient protocol to manage the replicas and make sure the data is consistent and high in availability with a low
communication cost.In this paper, we introduced a new protocol, named Diagonal Replication in Mesh (DRM) for data replica control protocol for a large dynamic network by using quorum and voting techniques to improve the availability and the communication cost because quorum techniques reduce the number of copies involved in reading or
writing data.The protocol of DRM replicates data for large dynamic network by putting the protocol in a logical mesh structure and access consistent data by ensuring the quorum not to have a nonempty intersection quorum.To evaluate our protocol, we developed a simulation model in Java.Our results proved that DRM improves the performance of the
response time compare to Three Dimensional Grid structure Protocol (TDGS)
A dynamic replica creation: Which file to replicate?
Data Grid is an infrastructure that manages huge amount of data files and provides intensive computational resources across geographically distributed collaboration.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.Data replication is one of the methods used to improve the performance of data access in distributed systems.In this paper, we propose a dynamic replication strategy that is based on exponential growth or decay rate and dependency level of data files (EXPM).Simulation results (via Optorsim) show that EXPM outperformed LALW in
the measured metrics – mean job execution time, effective network usage and average storage usage
A dynamic replication strategy based on exponential growth/decay rate
Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.Data replication is one of the methods used to improve the performance of data access in distributed systems.In this paper, we include issues arising in data replication domain
and also we propose a dynamic replication strategy that is based on exponential growth or decay rate. The purpose of the proposed strategy is to identify which files to be replicated.This is achieved by estimating number of accessed of a file in the upcoming time interval.The greater the value, the more popular the file is and therefore will be selected to be replicate
Replica maintenance strategy for data grid
Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration.Increasing the performance of such system can be achieved by improving the overall resource usage, which includes network and storage resources.Improving network resource usage is achieved by good utilization of network bandwidth that is considered as an important factor affecting job execution time.Meanwhile, improving storage resource usage is achieved by good utilization of storage space usage. Data replication is one of the methods used to improve the performance of data access in distributed systems by replicating multiple copies of data files in the distributed sites.Having distributed the replicas to various locations, they need to be monitored.As a result of dynamic changes in the data grid environment, some of the replicas need to be relocated.In this paper we proposed a maintenance replica placement strategy termed as Unwanted Replica Deletion Strategy (URDS) as a part of Replica maintenance service.The main purpose of the proposed strategy is to find the placement of unwanted replicas to be deleted.OptorSim is used to evaluate the performance of the proposed strategy. The simulation results show that URDS requires less execution time and consumes less network usage and has a best utilization of storage space usage compared to existing approaches
Características de GRIDs vs. sistemas peer-to-peer y su posible conjunción
El proyecto Computación Distribuida de Alto Rendimiento y Disponibilidad que se está desarrollando en el LISiDi ocupa varias líneas de atención como: seguridad, problemas de exclusión mutua en sistemas distribuidos, memoria compartida distribuida, movilidad y grids. Dentro del tema de grids se abre una línea de estudio y desarrollo que es tratar de compatibilizar las características de los sistemas peer-to-peer con los grids y tratar de aprovechar lo mejor de ambos mundos.Eje: Procesamiento Concurrente, Paralelo y DistribuidoRed de Universidades con Carreras en Informática (RedUNCI