32 research outputs found

    Scalable Reed-Solomon-based Reliable Local Storage for HPC Applications on IaaS Clouds

    Get PDF
    International audienceWith increasing interest among mainstream users to run HPC applications, Infrastructure-as-a-Service (IaaS) cloud computing platforms represent a viable alternative to the acquisition and maintenance of expensive hardware, often out of the financial capabilities of such users. Also, one of the critical needs of HPC applications is an efficient, scalable and persistent storage. Unfortunately, storage options proposed by cloud providers are not standardized and typically use a different access model. In this context, the local disks on the compute nodes can be used to save large data sets such as the data generated by Checkpoint-Restart (CR). This local storage offers high throughput and scalability but it needs to be combined with persistency techniques, such as block replication or erasure codes. One of the main challenges that such techniques face is to minimize the overhead of performance and I/O resource utilization (i.e., storage space and bandwidth), while at the same time guaranteeing high reliability of the saved data. This paper introduces a novel persistency technique that leverages Reed-Solomon (RS) encoding to save data in a reliable fashion. Compared to traditional approaches that rely on block replication, we demonstrate about 50% higher throughput while reducing network bandwidth and storage utilization by a factor of 2 for the same targeted reliability level. This is achieved both by modeling and real life experimentation on hundreds of nodes

    Erasure Code Based Cloud Storage System

    Get PDF
    Cloud Computing is the technology that provides on demand services and resources like storage space, networks, programming language execution environment on the top of Internet pay per use model. Cloud computing is globalized concept and there are no borders within the Cloud. Because of attractive features of Cloud computing, many organizations are using Cloud storage for storing their critical information. The data can be stored remotely in the Cloud by user and can be accessed using thin clients as and when required. One of the major issue in Cloud today is data security. Storage of data in the Cloud can be risky because storage is done on Cloud service providers� servers which mean less control over the stored data. One of the major concern in Cloud is how do we grab all the benefits of Cloud while maintaining security controls over the data. In this paper reliable storage system is proposed which can be robust in case of errors or erasures in data to be stored. Proposed system provides reliable storage while maintaining the integrity of the data. The files are split into parts to get an extra layer of security

    Erasure Code Based Cloud Storage System

    Get PDF
    Cloud Computing is the technology that provides on demand services and resources like storage space, networks, programming language execution environment on the top of Internet pay per use model. Cloud computing is globalized concept and there are no borders within the Cloud. Because of attractive features of Cloud computing, many organizations are using Cloud storage for storing their critical information. The data can be stored remotely in the Cloud by user and can be accessed using thin clients as and when required. One of the major issue in Cloud today is data security. Storage of data in the Cloud can be risky because storage is done on Cloud service providers’ servers which mean less control over the stored data. One of the major concern in Cloud is how do we grab all the benefits of Cloud while maintaining security controls over the data. In this paper reliable storage system is proposed which can be robust in case of errors or erasures in data to be stored. Proposed system provides reliable storage while maintaining the integrity of the data. The files are split into parts to get an extra layer of securit

    BlobCR: Virtual Disk Based Checkpoint-Restart for HPC Applications on IaaS Clouds

    Get PDF
    International audienceInfrastructure-as-a-Service (IaaS) cloud computing is gaining significant interest in industry and academia as an alternative platform for running HPC applications. Given the need to provide fault tolerance, support for suspend-resume and offline migration, an efficient Checkpoint-Restart mechanism becomes paramount in this context. We propose BlobCR, a dedicated checkpoint repository that is able to take live incremental snapshots of the whole disk attached to the virtual machine (VM) instances. BlobCR aims to minimize the performance overhead of checkpointing by persisting VM disk snapshots asynchronously in the background using a low overhead technique we call selective copy-on-write. It includes support for both application-level and process-level checkpointing, as well as support to roll back file system changes. Experiments at large scale demonstrate the benefits of our proposal both in synthetic settings and for a real-life HPC application

    Cost Effective Information Dispersal and Retrieval Framework for Cloud Storage

    Get PDF
    Cloud data storage applications widely demand security of data with minimum cost. Various cloud computing security threats supposed to be addressed in Cloud data service include Data Access Controllability, Data Confidentiality, and Data Integrity. In this paper, we propose a cost effective Information Dispersal and Retrieval framework for Cloud storage. Our proposed framework is different from existing approaches of replication. In our approach, multiple datacenters are considered as virtual independent disks for storing redundant data encoded with erasure codes and hence the proposed framework enables to retrieve user file even when failure of certain number of Cloud services occur . Besides security related benefits of our approach, the application provides user the cost-availability pattern of datacenters and allows cost effective storage on Cloud within user�s budget limit

    AI-Ckpt: Leveraging Memory Access Patterns for Adaptive Asynchronous Incremental Checkpointing

    Get PDF
    International audienceWith increasing scale and complexity of supercomputing and cloud computing architectures, faults are becoming a frequent occurrence, which makes reliability a difficult challenge. Although for some applications it is enough to restart failed tasks, there is a large class of applications where tasks run for a long time or are tightly coupled, thus making a restart from scratch unfeasible. Checkpoint-Restart (CR), the main method to survive failures for such applications faces additional challenges in this context: not only does it need to minimize the performance overhead on the application due to checkpointing, but it also needs to operate with scarce resources. Given the iterative nature of the targeted applications, we launch the assumption that first-time writes to memory during asynchronous checkpointing generate the same kind of interference as they did in past iterations. Based on this assumption, we propose novel asynchronous checkpointing approach that leverages both current and past access pattern trends in order to optimize the order in which memory pages are flushed to stable storage. Large scale experiments show up to 60% improvement when compared to state-of-art checkpointing approaches, all this achievable with an extra memory requirement of less than 5% of the total application memory

    SInCom 2015

    Get PDF
    2nd Baden-Württemberg Center of Applied Research Symposium on Information and Communication Systems, SInCom 2015, 13. November 2015 in Konstan
    corecore