13 research outputs found

    Towards efficient localization of dynamic replicas for Geo-Distributed data stores

    Get PDF
    Large-scale scientific experiments increasingly rely on geo- distributed clouds to serve relevant data to scientists world- wide with minimal latency. State-of-the-art caching systems often require the client to access the data through a caching proxy, or to contact a metadata server to locate the closest available copy of the desired data. Also, such caching sys- tems are inconsistent with the design of distributed hash- table databases such as Dynamo, which focus on allowing clients to locate data independently. We argue there is a gap between existing state-of-the-art solutions and the needs of geographically distributed applications, which require fast access to popular objects while not degrading access latency for the rest of the data. In this paper, we introduce a proba- bilistic algorithm allowing the user to locate the closest copy of the data e?ciently and independently with minimal over- head, allowing low-latency access to non-cached data. Also, we propose a network-e?cient technique to identify the most popular data objects in the cluster and trigger their replica- tion close to the clients. Experiments with a real-world data set show that these principles allow clients to locate the clos- est available copy of data with small memory footprint and low error-rate, thus improving read-latency for non-cached data and allowing hot data to be read locally

    Týr: Stockage Massif Transactionnel à Hautes-Performances

    Get PDF
    As the computational power used by large-scale applications increases, the amount of data they need to manipulate tends to increase as well. A wide range of such applications requires robust and flexible storage support for atomic, durable and concurrent transactions. Historically, databases have provided the de facto solution to transactional data management, but they have forced applications to drop control over data layout and access mechanisms, while remaining unable to meet the scale requirements of Big Data. More recently, key-value stores have been introduced to address these issues. However, this solution does not provide transactions, or only restricted transaction support, compelling users to carefully coordinate access to data in order to avoid race conditions, partial writes, overwrites, and other hard problems that cause erratic behaviour. We argue there is a gap between existing storage solutions and application requirements that limits the design of transaction-oriented data-intensive applications. In this paper we introduce Týr, a massively parallel distributed transactional blob storage system. A key feature behind Týr is its novel multi-versioning management designed to keep the metadata overhead as low as possible while still allowing fast queries or updates and preserving transaction semantics. Its share-nothing architecture ensures minimal contention and provides low latency for large numbers of concurrent requests. Týr is the first blob storage system to provide sequential consistency and high throughput, while enabling unforeseen transaction support. Experiments with a real-life application from the CERN LHC show Týr throughput outperforming state-of-the-art solutions by more than 100%.À mesure que la puissance de calcul utilisée par des applications à grande échelle augmente, le volume de données qu’elles manipulent tend à augmenter également. Une grande partie de ces applications nécessite un système de stockage robuste et flexible permettant l’exécution de transactions de manière concurrente. Antérieurement, les bases de données furent la solution de facto pour la gestion des données transactionnelles, mais elles empêchent les applications de contrôler l’organisation du stockage des données ainsi que l’accés à ces données, tout en restant incapables de répondre aux contraintes posées par les données massives. Plus récemment, des systèmes de stockage clé-valeur ont été créés pour répondre à cette problématique. Cependant, ces solutions ne fournissent pas de support des transactions, ou seulement un support partiel, imposant aux utilisateurs de coordonner avec soin l’accès aux données afin d’éviter tout état de concurrence, écritures partielles, surécritures, ainsi que d’autres problèmes à l’origine d’un comportement erratique des applications. Nous soutenons qu’il existe un fossé entre les solutions de stockage actuelles et les besoins des utilisateurs, ce qui limite la conception des applications transactionnelles gérant des volumes massifs de données. Dans ce document, nous présentons Týr, un système de stockage de blobs distribué et transactionnel. Une des caractéristiques principales de Týr est sa gestion des versions novatrice conçue pour permettre un accès rapide tant en lecture qu’en écriture aux données tout en gardant une sémantique transactionnelle et en nécessitant une faible surcharge de métadonnées. Son architecture décentralisée garantit une contention minimale et permet une faible latence avec un nombre important de requêtes concurrentes. Týr est le permier système de stockage de blobs à fournir à la fois une consistence séquentielle et un débit élevé, tout en apportant le support des transactions. Les expériences réalisées avec une application réelle du CERN LHC montrent que le débit de Týr surpasse celui des solutions actuelles de plus de 100%

    Tyr: Blob Storage Meets Built-In Transactions

    Get PDF
    International audienceConcurrent Big Data applications often require high-performance storage, as well as ACID (Atomicity, Consistency , Isolation, Durability) transaction support. Although blobs (binary large objects) are an increasingly popular model for addressing the storage needs of such applications, state-of-the-art blob storage systems typically offer no transaction semantics. This demands users to coordinate access to data carefully in order to avoid race conditions, inconsistent writes, overwrites and other problems that cause erratic behavior. We argue there is a gap between existing storage solutions and application requirements, which limits the design of transaction-oriented applications. We introduce Tyr , the first blob storage system to provide built-in, multiblob transactions, while retaining sequential consistency and high throughput under heavy access concurrency. Tyr offers fine-grained random write access to data and in-place atomic operations. Large-scale experiments on Microsoft Azure with a production application from CERN LHC show Tyr throughput outperforming state-of-the-art solutions by more than 75%

    Lentiviral gene transfer of RPE65 rescues survival and function of cones in a mouse model of Leber congenital amaurosis.

    Get PDF
    BACKGROUND: RPE65 is specifically expressed in the retinal pigment epithelium and is essential for the recycling of 11-cis-retinal, the chromophore of rod and cone opsins. In humans, mutations in RPE65 lead to Leber congenital amaurosis or early-onset retinal dystrophy, a severe form of retinitis pigmentosa. The proof of feasibility of gene therapy for RPE65 deficiency has already been established in a dog model of Leber congenital amaurosis, but rescue of the cone function, although crucial for human high-acuity vision, has never been strictly proven. In Rpe65 knockout mice, photoreceptors show a drastically reduced light sensitivity and are subject to degeneration, the cone photoreceptors being lost at early stages of the disease. In the present study, we address the question of whether application of a lentiviral vector expressing the Rpe65 mouse cDNA prevents cone degeneration and restores cone function in Rpe65 knockout mice. METHODS AND FINDINGS: Subretinal injection of the vector in Rpe65-deficient mice led to sustained expression of Rpe65 in the retinal pigment epithelium. Electroretinogram recordings showed that Rpe65 gene transfer restored retinal function to a near-normal pattern. We performed histological analyses using cone-specific markers and demonstrated that Rpe65 gene transfer completely prevented cone degeneration until at least four months, an age at which almost all cones have degenerated in the untreated Rpe65-deficient mouse. We established an algorithm that allows prediction of the cone-rescue area as a function of transgene expression, which should be a useful tool for future clinical trials. Finally, in mice deficient for both RPE65 and rod transducin, Rpe65 gene transfer restored cone function when applied at an early stage of the disease. CONCLUSIONS: By demonstrating that lentivirus-mediated Rpe65 gene transfer protects and restores the function of cones in the Rpe65(-/-) mouse, this study reinforces the therapeutic value of gene therapy for RPE65 deficiencies, suggests a cone-preserving treatment for the retina, and evaluates a potentially effective viral vector for this purpose

    TĂœrFS: Increasing Small Files Access Performance with Dynamic Metadata Replication

    Get PDF
    International audienceSmall files are known to pose major performance challenges for file systems. Yet, such workloads are increasingly common in a number of Big Data Analytics workflows or large-scale HPC simulations. These challenges are mainly caused by the common architecture of most state-of-the-art file systems needing one or multiple metadata requests before being able to read from a file. Small input file size causes the overhead of this metadata management to gain relative importance as the size of each file decreases. In this paper we propose a set of techniques leveraging consistent hashing and dynamic metadata replication to significantly reduce this metadata overhead. We implement such techniques inside a new file system named TĂœrFS, built as a thin layer above the TĂœr object store. We prove that TĂœrFS increases small file access performance up to one order of magnitude compared to other state-of-the-art file systems, while only causing a minimal impact on file write throughput

    Týr: Stockage Massif Transactionnel à Hautes-Performances

    Get PDF
    As the computational power used by large-scale applications increases, the amount of data they need to manipulate tends to increase as well. A wide range of such applications requires robust and flexible storage support for atomic, durable and concurrent transactions. Historically, databases have provided the de facto solution to transactional data management, but they have forced applications to drop control over data layout and access mechanisms, while remaining unable to meet the scale requirements of Big Data. More recently, key-value stores have been introduced to address these issues. However, this solution does not provide transactions, or only restricted transaction support, compelling users to carefully coordinate access to data in order to avoid race conditions, partial writes, overwrites, and other hard problems that cause erratic behaviour. We argue there is a gap between existing storage solutions and application requirements that limits the design of transaction-oriented data-intensive applications. In this paper we introduce Týr, a massively parallel distributed transactional blob storage system. A key feature behind Týr is its novel multi-versioning management designed to keep the metadata overhead as low as possible while still allowing fast queries or updates and preserving transaction semantics. Its share-nothing architecture ensures minimal contention and provides low latency for large numbers of concurrent requests. Týr is the first blob storage system to provide sequential consistency and high throughput, while enabling unforeseen transaction support. Experiments with a real-life application from the CERN LHC show Týr throughput outperforming state-of-the-art solutions by more than 100%.À mesure que la puissance de calcul utilisée par des applications à grande échelle augmente, le volume de données qu’elles manipulent tend à augmenter également. Une grande partie de ces applications nécessite un système de stockage robuste et flexible permettant l’exécution de transactions de manière concurrente. Antérieurement, les bases de données furent la solution de facto pour la gestion des données transactionnelles, mais elles empêchent les applications de contrôler l’organisation du stockage des données ainsi que l’accés à ces données, tout en restant incapables de répondre aux contraintes posées par les données massives. Plus récemment, des systèmes de stockage clé-valeur ont été créés pour répondre à cette problématique. Cependant, ces solutions ne fournissent pas de support des transactions, ou seulement un support partiel, imposant aux utilisateurs de coordonner avec soin l’accès aux données afin d’éviter tout état de concurrence, écritures partielles, surécritures, ainsi que d’autres problèmes à l’origine d’un comportement erratique des applications. Nous soutenons qu’il existe un fossé entre les solutions de stockage actuelles et les besoins des utilisateurs, ce qui limite la conception des applications transactionnelles gérant des volumes massifs de données. Dans ce document, nous présentons Týr, un système de stockage de blobs distribué et transactionnel. Une des caractéristiques principales de Týr est sa gestion des versions novatrice conçue pour permettre un accès rapide tant en lecture qu’en écriture aux données tout en gardant une sémantique transactionnelle et en nécessitant une faible surcharge de métadonnées. Son architecture décentralisée garantit une contention minimale et permet une faible latence avec un nombre important de requêtes concurrentes. Týr est le permier système de stockage de blobs à fournir à la fois une consistence séquentielle et un débit élevé, tout en apportant le support des transactions. Les expériences réalisées avec une application réelle du CERN LHC montrent que le débit de Týr surpasse celui des solutions actuelles de plus de 100%

    Keeping up with storage: Decentralized, write-enabled dynamic geo-replication

    Get PDF
    International audienceLarge-scale applications are ever-increasingly geo-distributed. Maintaining the highest possible data locality is crucial to ensure high performance of such applications. Dynamic replication addresses this problem by dynamically creating replicas of frequently accessed data close to the clients. This data is often stored in decentralized storage systems such as Dynamo or Voldemort, which offer support for mutable data. However, existing approaches to dynamic replication for such mutable data remain centralized, thus incompatible with these systems. In this paper we introduce a write-enabled dynamic replication scheme that leverages the decentralized architecture of such storage systems. We propose an algorithm enabling clients to locate tentatively the closest data replica without prior request to any metadata node. Large-scale experiments on various workloads show a read latency decrease of up to 42% compared to other state-of-the-art, caching-based solutions

    Could Blobs Fuel Storage-Based Convergence Between HPC and Big Data?

    Get PDF
    International audienceThe increasingly growing data sets processed on HPC platforms raise major challenges for the underlying storage layer. A promising alternative to POSIX-IO-compliant file systems are simpler blobs (binary large objects), or object storage systems. They offer lower overhead and better performance at the cost of largely unused features such as file hierarchies or permissions. Similarly, blobs are increasingly considered for replacing distributed file systems for big data analytics or as a base for storage abstractions like key-value stores or time-series databases. This growing interest in such object storage on HPC and big data platforms raises the question: Are blobs the right level of abstraction to enable storage-based convergence between HPC and Big Data? In this paper we take a first step towards answering the question by analyzing the applicability of blobs for both platforms

    Keeping up with storage: decentralized, write-enabled dynamic geo-replication

    No full text
    Large-scale applications are ever-increasingly geo-distributed. Maintaining the highest possible data locality is crucial to ensure high performance of such applications. Dynamic replication addresses this problem by dynamically creating replicas of frequently accessed data close to the clients. This data is often stored in decentralized storage systems such as Dynamo or Voldemort, which offer support for mutable data. However, existing approaches to dynamic replication for such mutable data remain centralized, thus incompatible with these systems. In this paper we introduce a write-enabled dynamic replication scheme that leverages the decentralized architecture of such storage systems. We propose an algorithm enabling clients to locate tentatively the closest data replica without prior request to any metadata node. Large-scale experiments on various workloads show a read latency decrease of up to 42% compared to other state-of-the-art, caching-based solutions
    corecore