50 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Data as a Service (DaaS) for sharing and processing of large data collections in the cloud

    Get PDF
    Data as a Service (DaaS) is among the latest kind of services being investigated in the Cloud computing community. The main aim of DaaS is to overcome limitations of state-of-the-art approaches in data technologies, according to which data is stored and accessed from repositories whose location is known and is relevant for sharing and processing. Besides limitations for the data sharing, current approaches also do not achieve to fully separate/decouple software services from data and thus impose limitations in inter-operability. In this paper we propose a DaaS approach for intelligent sharing and processing of large data collections with the aim of abstracting the data location (by making it relevant to the needs of sharing and accessing) and to fully decouple the data and its processing. The aim of our approach is to build a Cloud computing platform, offering DaaS to support large communities of users that need to share, access, and process the data for collectively building knowledge from data. We exemplify the approach from large data collections from health and biology domains.Peer ReviewedPostprint (author's final draft

    Grid-based Search Technique for Massive Academic Publications

    Full text link
    The numerical size of academic publications that are being published in recent years had grown rapidly. Accessing and searching massive academic publications that are distributed over several locations need large amount of computing resources to increase the system performance. Therefore, many grid-based search techniques were proposed to provide flexible methods for searching the distributed extensive data. This paper proposes search technique that is capable of searching the extensive publications by utilizing grid computing technology. The search technique is implemented as interconnected grid services to offer a mechanism to access different data locations. The experimental result shows that the grid-based search technique has enhanced the performance of the search.Comment: 4 pages, 5 figures, conference. The 2014 Third ICT International Student Project Conference (ICT-ISPC2014

    Simulation of Diagonal Data Replication in Mesh

    Get PDF
    In a large dynamic network, data can be copied anywhere to make it fault tolerant and easy accessed but there must be an efficient protocol to manage the replicas and make sure the data is consistent and high in availability with a low communication cost.In this paper, we introduced a new protocol, named Diagonal Replication in Mesh (DRM) for data replica control protocol for a large dynamic network by using quorum and voting techniques to improve the availability and the communication cost because quorum techniques reduce the number of copies involved in reading or writing data.The protocol of DRM replicates data for large dynamic network by putting the protocol in a logical mesh structure and access consistent data by ensuring the quorum not to have a nonempty intersection quorum.To evaluate our protocol, we developed a simulation model in Java.Our results proved that DRM improves the performance of the response time compare to Three Dimensional Grid structure Protocol (TDGS)

    A dynamic replica creation: Which file to replicate?

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files and provides intensive computational resources across geographically distributed collaboration.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.Data replication is one of the methods used to improve the performance of data access in distributed systems.In this paper, we propose a dynamic replication strategy that is based on exponential growth or decay rate and dependency level of data files (EXPM).Simulation results (via Optorsim) show that EXPM outperformed LALW in the measured metrics – mean job execution time, effective network usage and average storage usage

    A dynamic replication strategy based on exponential growth/decay rate

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration.To increase resource availability and to ease resource sharing in such environment, there is a need for replication services.Data replication is one of the methods used to improve the performance of data access in distributed systems.In this paper, we include issues arising in data replication domain and also we propose a dynamic replication strategy that is based on exponential growth or decay rate. The purpose of the proposed strategy is to identify which files to be replicated.This is achieved by estimating number of accessed of a file in the upcoming time interval.The greater the value, the more popular the file is and therefore will be selected to be replicate

    Replica maintenance strategy for data grid

    Get PDF
    Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration.Increasing the performance of such system can be achieved by improving the overall resource usage, which includes network and storage resources.Improving network resource usage is achieved by good utilization of network bandwidth that is considered as an important factor affecting job execution time.Meanwhile, improving storage resource usage is achieved by good utilization of storage space usage. Data replication is one of the methods used to improve the performance of data access in distributed systems by replicating multiple copies of data files in the distributed sites.Having distributed the replicas to various locations, they need to be monitored.As a result of dynamic changes in the data grid environment, some of the replicas need to be relocated.In this paper we proposed a maintenance replica placement strategy termed as Unwanted Replica Deletion Strategy (URDS) as a part of Replica maintenance service.The main purpose of the proposed strategy is to find the placement of unwanted replicas to be deleted.OptorSim is used to evaluate the performance of the proposed strategy. The simulation results show that URDS requires less execution time and consumes less network usage and has a best utilization of storage space usage compared to existing approaches

    Características de GRIDs vs. sistemas peer-to-peer y su posible conjunción

    Get PDF
    El proyecto Computación Distribuida de Alto Rendimiento y Disponibilidad que se está desarrollando en el LISiDi ocupa varias líneas de atención como: seguridad, problemas de exclusión mutua en sistemas distribuidos, memoria compartida distribuida, movilidad y grids. Dentro del tema de grids se abre una línea de estudio y desarrollo que es tratar de compatibilizar las características de los sistemas peer-to-peer con los grids y tratar de aprovechar lo mejor de ambos mundos.Eje: Procesamiento Concurrente, Paralelo y DistribuidoRed de Universidades con Carreras en Informática (RedUNCI
    corecore