5 research outputs found

    How replicated data management in the cloud can benefit from a data grid protocol - the Re:GRIDiT Approach

    Get PDF
    Cloud computing has recently received considerable attention both in industry and academia. Due to the great success of the first generation of Cloud-based services, providers have to deal with larger and larger volumes of data. Quality of service agreements with customers require data to be replicated across data centers in order to guarantee a high degree of availability. In this context, Cloud Data Management has to address several challenges, especially when replicated data are concurrently updated at different sites or when the system workload and the resources requested by clients change dynamically. Mostly independent from recent developments in Cloud Data Management, Data Grids have undergone a transition from pure file management with read only access to more powerful systems. In our recent work,we have developed the Re:GRIDiT protocol for managing data in the Grid which provides concurrent access to replicated data at different sites without any global component and supports the dynamic deployment of replicas. Since it is independent from the underlying Grid middleware, it can be seamlessly transferred to other environments like the Cloud.In this paper, we compare Data Management in the Grid and the Cloud, briefly introduce the Re:GRIDiT protocol and show its applicability for Cloud Data Management

    Dynamic data replication in the grid with freshness and correctness guarantees

    Get PDF
    This thesis explores architectural issues and performance aspects of data Grid infrastructures. The objective is to develop a scalable infrastructure that is capable to dynamically manage replicated data in the Grid while at the same time providing freshness and correctness guarantees. We propose a decentralized middleware which can be deployed on top of any Grid (or any distributed, heterogeneous) infrastructure. The difficulty is to ensure that such an infrastructure can offer scalability, performance and correctness. The overall goal of this thesis is to present a replication mechanism that combines scalability, global correctness and quality of service guarantees in a dynamic way. In the beginning we introduce important aspects of Grid environments and several scenarios from newly emerging eScience applications. These use case scenarios urgently require new integrated approaches to dynamic replication in a data Grid. Our main contribution is the Re:GRIDiT protocols that dynamically manage replicas in the Grid, while at the same time providing freshness and correctness guarantees. The Re:GRIDiT family consists of three different protocols which target the three main problematic aspects identified in current data Grid infrastructures. Inspired by the requirements deduced from these scenarios we first concentrate our efforts on the more complex and general case of distributed update transactions on replicated data. We devise a protocol for the correct synchronization of concurrent updates to different updateable replicas in order to ensure their subsequent propagation to read-only replicas in a completely distributed way. Re:SYNCiT hides the presence of replicas to the applications, takes into account the special characteristics of data in the Grid such as version support, distinction between mutable and immutable objects, and provides provably correct transactional execution guarantees without any global component. The next step is the Re:LOADiT approach to dynamic distributed replica management in data Grid systems. We propose efficient algorithms for selecting optimal locations for placing the replicas so that the load among these replicas is balanced. Given the data usage from each user site and the maximum load of each replica, our algorithm efficiently manages the number of replicas required, reducing or increasing their number. Until now our approach dictates how update sites behave and from a user's point of view the clients will always access the most up-to-date data. We further refine this approach and introduce the Re:FRESHiT protocol, which allows to effectively trade freshness for performance and addresses freshness and versioning issues, needed in many Grid application domains, without losing consistency. Queries with different freshness levels are cleverly routed along our tree strategy, by taking advantage of the tree structure. Finally we are also interested in the performance characteristics of the presented algorithms. We have implemented the Re:GRIDiT protocols using state-of-the-art Web service technologies which allows an easy and seamless deployment in any Grid environment. The evaluation has been conducted on up to 48 update sites and 48 read-only sites. We have used simulated workloads that mimic the behavior expected from our use case applications. Our evaluations have shown that the proposed Re:GRIDiT protocols are efficient, as replicas are created and/or deleted on demand and with a reasonable amount of resources. Dynamic changes in the tree structure allow flexible and efficient query routing along the tree structure. Clever routing strategies ensure an increased performance for queries with different freshness levels. Re:GRIDiT ensures replica consistency and is capable of providing different degrees of consistency and update frequencies. Summarizing, this thesis presents new approaches for the correct synchronization of updates in a dynamic manner, replication management, and freshness guarantees in a data Grid. These approaches are founded on formal theoretical background and implemented in a full-fledged prototype in a realistic Grid environment. These approaches have been proven to be scalable by means of an extensive analytical and experimental evaluation

    Replicated Data Management in the Grid: The Re:GRIDiT Approach

    No full text
    Grid environments more and more target novel domains suchas eScience, eHealth or digital libraries that feature a varietyof data-intensive applications. Consequently, issues relatedto data management in Grids are becoming increasingly important.In terms of data management, the Grid allowskeeping a large number of replicas of data objects, possiblywith different versions or levels of freshness, to allow for ahigh degree of availability, reliability and performance so asto best meet the needs of users and applications. At thesame time, the seamless integration of replication managementinto the Grid while taking into account its special characteristics,needs to be done without any central componentfor managing data or metadata. In this paper, we report onthe ongoing Re:GRIDiT project which aims at addressingall the above requirements. Re:GRIDiT distinguishes betweenpotentially many updateable and read-only replicaswhich can be distributed across a Grid environment. First,Re:GRIDiT provides new protocols for the correct synchronizationof concurrent updates to different updateable replicasand their subsequent propagation in a completely distributedway. Second, Re:GRIDiT takes into account thesemantics of the data which is managed in the Grid: mutabledata can be subject to updates; immutable data, in turn,cannot be changed once created, but may be subject to versioncontrol. Third, Re:GRIDiT will be dynamic in a waythat according to the current load, new replicas (updateableor read-only) can be created or removed on demand. Fourth, Re:GRIDiT will provide read-only transactions the full flexibilityto specify the freshness (for mutable data) or versionnumber (for immutable data) – which is particularly usefulin order to trade accuracy for performance in the access todata in the Grid

    Re:GRIDiT – Coordinating Distributed Update Transactions on Replicated Data in the Grid

    No full text
    The recent proliferation of Grid environments foreScience applications led to common computing infrastructureswith nearly unlimited storage capabilities. Interms of data management, the Grid allows keeping alarge number of replicas of data objects to allow for ahigh degree of availability, reliability and performance.Due to the particular characteristics of the Grid,especially due to the absence of a global coordinator,dealing with many updateable replicas per data objecturgently requires new protocols for the synchronizationof updates and their subsequent propagation. Currentlythere is no protocol which can be seamlessly applied toa data Grid environment without impacting correctnessand/or overall performance. In this paper we addressthe problem of replication in the Data Grid in thepresence of updates. We have designed the Re:GRIDiTprotocol that focuses on the correct synchronization ofupdates to several replicas in the Grid in a completelydistributed way, extending well-established databasereplication techniques. Globally correct execution isprovided by communication between transactions andsites. Re:GRIDiT takes into account the special characteristicsof eScience applications such as the distinctionbetween mutable objects, that can be updated byusers and immutable objects. Finally, we provide a detailedevaluation of the performance of the Re:GRIDiTprotocol when being applied at Grid scale
    corecore