10 research outputs found

    A survey and classification of storage deduplication systems

    Get PDF
    The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010

    Evaluating the usefulness of content addressable storage for high-performance data intensive applications

    No full text
    Content Addressable Storage (CAS) is a data representation technique that operates by partitioning a given data-set into non-intersecting units called chunks and then employing techniques to efficiently recognize chunks occurring multiple times. This allows CAS to eliminate duplicate instances of such chunks, resulting in reduced storage space compared to conventional representations of data. CAS is an attractive technique for reducing the storage and network bandwidth needs of performance-sensitive, data-intensive applications in a variety of domains. These include enterprise applications, Web-based e-commerce or entertainment services and highly parallel scientific/engineering applications and simulations, to name a few. In this paper, we conduct an empirical evaluation of the benefits offered by CAS to a variety of real-world data-intensive applications. The savings offered by CAS depend crucially on (i) the nature of the data-set itself and (ii) the chunk-size that CAS employs. We investigate the impact of both these factors on disk space savings, savings in network bandwidth, and error resilience of data. We find that a chunk-size of 1 KB can provide up to 84 % savings in disk space and even higher savings in network bandwidth whilst trading off error resilience and incurring 14 % CAS related overheads. Drawing upon lessons learned from our study, we provide insights on (i) the choice of the chunk-size for effective space savings and (ii) the use of selective data replication to counter the loss of error resilience caused by CAS

    Design Tradeoffs in Applying Content Addressable Storage

    No full text
    This paper analyzes the usage data from a live deployment of an enterprise client management system based on virtual machine (VM) technology. Over a period of seven months, twenty-three volunteers used VM-based computing environments hosted by the system and created over 800 checkpoints of VM state, where each checkpoint included the virtual memory and disk states. Using this data, we study the design tradeoffs in applying content addressable storage (CAS) to such VM-based systems. In particular, we explore the impact on storage requirements and network load of different privacy properties and data granularities in the design of the underlying CAS system. The study clearly demonstrates that relaxing privacy can reduce the resource requirements of the system, and identifies designs that provide reasonable compromises between privacy and resource demands

    Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems Based on Virtual Machines

    No full text
    This paper analyzes the usage data from a live deployment of an enterprise client management system based on virtual machine (VM) technology. Over a period of seven months, twenty-three volunteers used VM-based computing environments hosted by the system and created over 800 checkpoints of VM state, where each checkpoint included the virtual memory and disk states. Using this data, we study the design tradeoffs in applying content addressable storage (CAS) to such VM-based systems. In particular, we explore the impact on storage requirements and network load of different privacy properties and data granularities in the design of the underlying CAS system. The study clearly demonstrates that relaxing privacy can reduce the resource requirements of the system, and identifies designs that provide reasonable compromises between privacy and resource demands

    To Carry or To Find? Footloose on the Internet with a zero-pound laptop

    No full text
    Internet Suspend/Resume (ISR) is a new model of personal computing that cuts the tight binding between personal computing state and personal computing hardware. ISR is implemented by layering a virtual machine (VM) on distributed storage. The VM encapsulates execution and user customization state; distributed storage transports that state across space and time. In this paper, we explore the implications of ISR for an infrastructure-based approach to mobile computing. We report on our experience with three versions of ISR

    COVID-19 infected ST-Elevation myocardial infarction in India (COSTA INDIA)

    No full text
    Objective: To find out differences in the presentation, management and outcomes of COVID-19 infected STEMI patients compared to age and sex-matched non-infected STEMI patients treated during the same period. Methods: This was a retrospective multicentre observational registry in which we collected data of COVID-19 positive STEMI patients from selected tertiary care hospitals across India. For every COVID-19 positive STEMI patient, two age and sex-matched COVID-19 negative STEMI patients were enrolled as control. The primary endpoint was a composite of in-hospital mortality, re-infarction, heart failure, and stroke. Results: 410 COVID-19 positive STEMI cases were compared with 799 COVID-19 negative STEMI cases. The composite of death/reinfarction/stroke/heart failure was significantly higher among the COVID-19 positive STEMI patients compared with COVID-19 negative STEMI cases (27.1% vs 20.7% p value = 0.01); though mortality rate did not differ significantly (8.0% vs 5.8% p value = 0.13). Significantly lower proportion of COVID-19 positive STEMI patients received reperfusion treatment and primary PCI (60.7% vs 71.1% p value=< 0.001 and 15.4% vs 23.4% p value = 0.001 respectively). Rate of systematic early PCI (pharmaco-invasive treatment) was significantly lower in the COVID-19 positive group compared with COVID-19 negative group. There was no difference in the prevalence of high thrombus burden (14.5% and 12.0% p value = 0.55 among COVID-19 positive and negative patients respectively) Conclusions: In this large registry of STEMI patients, we did not find significant excess in in-hospital mortality among COVID-19 co-infected patients compared with non-infected patients despite lower rate of primary PCI and reperfusion treatment, though composite of in-hospital mortality, re-infarction, stroke and heart failure was higher
    corecore