537 research outputs found

    The Replica Consistency Problem in Data Grids

    Get PDF
    Fast and reliable data access is a crucial aspect in distributed computing and is often achieved using data replication techniques. In Grid architectures, data are replicated in many nodes of the Grid, and users usually access the "best" replica in terms of availability and network latency. When replicas are modifiable, a change made to one replica will break the consistency with the other replicas that, at that point, become stale. Replica synchronisation protocols exist and are applied in several distributed architectures, for example in distributed databases. Grid middleware solutions provide well established support for replicating data. Nevertheless, replicas are still considered read-only, and no support is provided to the user for updating a replica while maintaining the consistency with the other replicas. In this thesis, done in collaboration with the Italian National Institute of Nuclear Physics (INFN) and the European Organisation for Nuclear Research (CERN), we study the replica consistency problem in Grid computing and propose a service, called CONStanza, that is able to synchronise both files and heterogeneous (different vendors) databases in a Grid environment. We analyse and implement a specific use case that arises in high energy Physics, where conditions databases are replicated using databases of different makes. We provide detailed performance results, and show how CONStanza can be used together with Oracle Streams to provide multitier replication of conditions databases using Oracle and MySQL databases

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Fault Tolerant Resource Allocation for Query Processing in Grid Environments

    Get PDF
    International audienceIn this paper, we propose a new algorithm for fault-tolerant resource allocation for query processing in grid environments. For this, we propose an initial resource allocation algorithm followed by a fault-tolerance protocol. The proposed fault-tolerance protocol is based on the passive replication of stateful operators in queries. We provide theoretical analyses of the proposed algorithms and consolidate our analyses with the simulations

    Handling Confidential Data on the Untrusted Cloud: An Agent-based Approach

    Get PDF
    Cloud computing allows shared computer and storage facilities to be used by a multitude of clients. While cloud management is centralized, the information resides in the cloud and information sharing can be implemented via off-the-shelf techniques for multiuser databases. Users, however, are very diffident for not having full control over their sensitive data. Untrusted database-as-a-server techniques are neither readily extendable to the cloud environment nor easily understandable by non-technical users. To solve this problem, we present an approach where agents share reserved data in a secure manner by the use of simple grant-and-revoke permissions on shared data.Comment: 7 pages, 9 figures, Cloud Computing 201

    Two Major Issues in Data Grid Replication Process

    Full text link
    This paper discusses Data Grid as one of popular application of grid and cloud computing. In Data Grid, data are replicated among different nodes of the grid to increase the availability and efficiency. There are two mainissues regarding this replication process; how to do the replication (i.e. where and when to do replication) and how to synchronise all the replicas under heterogeneous database systems in grids to be always consistent. Thispaper explains those two major issues as well as some proposed methods and solutions in order to explore the next problems and challenges on this area. By identifying them, it can be useful to make some improvements inorder to give the best performance on the replication process, and hence can offer a better experience of Data Grid implementation to the users

    AliEnFS - a Linux File System for the AliEn Grid Services

    Full text link
    Among the services offered by the AliEn (ALICE Environment http://alien.cern.ch) Grid framework there is a virtual file catalogue to allow transparent access to distributed data-sets using various file transfer protocols. alienfsalienfs (AliEn File System) integrates the AliEn file catalogue as a new file system type into the Linux kernel using LUFS, a hybrid user space file system framework (Open Source http://lufs.sourceforge.net). LUFS uses a special kernel interface level called VFS (Virtual File System Switch) to communicate via a generalised file system interface to the AliEn file system daemon. The AliEn framework is used for authentication, catalogue browsing, file registration and read/write transfer operations. A C++ API implements the generic file system operations. The goal of AliEnFS is to allow users easy interactive access to a worldwide distributed virtual file system using familiar shell commands (f.e. cp,ls,rm ...) The paper discusses general aspects of Grid File Systems, the AliEn implementation and present and future developments for the AliEn Grid File System.Comment: 9 pages, 12 figure

    Contributions à la réplication de données dans les systèmes distribués à grande échelle

    Get PDF
    Data replication is a key mechanism for building a reliable and efficient data management system. Indeed, by keeping several replicas for each piece of data, it is possible to improve durability. Furthermore, well-placed copies reduce data accesstime. However, having multiple copies for a single piece of data creates consistency problems when the data is updated. Over the last years, I made contributions related to these three aspects: data durability, data access performance and data consistency. RelaxDHT and SPLAD enhance data durability by placing data copies smartly. Caju, AREN and POPS reduce access time by improving data locality and by taking popularity into account. To enhance data lookup performance, DONUT creates efficient shortcuts taking data distribution into account. Finally, in the replicated database context, Gargamel parallelizes independent transactions only, improving database performance and avoiding aborting transactions. My research has been carried out in collaboration with height PhD students, four of which have defended. In my future work, I plan to extend these contributions by (i) designing a storage system tailored for MMOGs, which are very demanding, and (ii) designing a data management system that is able to re-distribute data automatically in order to scale the number of servers up and down according to the changing workload, leading to a greener data management.La réplication de données est une technique clé pour permettre aux systèmes de gestion de données distribués à grande échelle d'offrir un stockage fiable et performant. Comme il gère un nombre suffisant de copies de chaque donnée, le système peut améliorer la pérennité. De plus, la présence de copies bien placées réduit les temps d'accès. Cependant, cette même existence de plusieurs copies pose des problèmes de cohérence en cas de modification. Ces dernières années, mes contributions ont porté sur ces trois aspects liés à la réplication de données: la pérennité des données, la performance desaccès et la gestion de la cohérence. RelaxDHT et SPLAD permettent d'améliorer la pérennité des données en jouant sur le placement des copies. Caju, AREN et POPS permettent de réduire les temps d'accès aux données en améliorant la localité et en prenant en compte la popularité. Pour accélérer la localisation des copies, DONUT crée des raccourcis efficaces prenant en compte la distribution des données. Enfin, dans le contexte des bases de données répliquées,Gargamel permet de ne paralléliser que les transactions qui sont indépendantes, améliorant ainsi les performances et évitant tout abandon de transaction pour cause de conflit. Ces travaux ont été réalisés avec huit étudiants en thèse dont quatre ont soutenu. Pour l'avenir, je me propose d'étendre ces travaux, d'une part en concevant un système de gestion de données pour les MMOGs, une classe d'application particulièrement exigeante; et, d'autre part, en concevant des mécanismes de gestion de données permettant de n'utiliser que la quantité strictement nécessaire de ressources, en redistribuant dynamiquement les données en fonction des besoins, un pas vers une gestion plus écologique des données

    Enabling object storage via shims for grid middleware

    Get PDF
    The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requirement for "POSIX-like" filesystem structures on top of SEs makes the disjunction seem larger. As modern Object Stores provide many features that most Grid SEs do not (block level striping, parallel access, automatic file repair, etc.), it is of interest to see how easily we can provide interfaces to typical Object Stores via plugins and shims for Grid tools, and how well experiments can adapt their data models to them. We present evaluation of, and first-deployment experiences with, (for example) Xrootd-Ceph interfaces for direct object-store access, as part of an initiative within GridPP[1] hosted at RAL. Additionally, we discuss the tradeoffs and experience of developing plugins for the currently-popular Ceph parallel distributed filesystem for the GFAL2 access layer, at Glasgow
    • …
    corecore