5 research outputs found
A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability
... this paper, we address this problem and propose a checkpointing mechanism relying on a recoverable distributed shared memory (DSM) in order to tolerate single node failures. Although most recoverable DSMs require specific hardware to store recovery data, our scheme uses standard memories to store both current and recovery data. Moreover, the management of recovery data is merged with the management of current data by extending the DSM's coherence protocol. This approach takes advantage of the data replication provided by a DSM in order to limit the amount of transferred pages during the checkpointing. The paper also presents an implementation and a preliminary performance evaluation of our recoverable DSM on a 56 nodes Intel Paragon
A recoverable distributed shared memory integrating coherence and recoverability
Programme 1 - Architectures paralleles, bases de donnees, reseaux et systemes distribues. Projet SOLIDORSIGLEAvailable at INIST (FR), Document Supply Service, under shelf-number : 22588, issue : a.1995 n.897 / INIST-CNRS - Institut de l'Information Scientifique et TechniqueFRFranc
A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability
: Large-scale distributed systems are very attractive for the execution of parallel applications requiring a huge computing power. However, their high probability of site failure is unacceptable, especially for long time running applications. In this paper, we address this problem and propose a checkpointing mechanism relying on a recoverable distributed shared memory (DSM). Although most recoverable DSM require specific hardware to store recovery data, our scheme uses standard memories to store both current and recovery data. Moreover, the management of recovery data is merged with the management of current data by extending the DSM's coherence protocol. This approach limits the hardware development and takes advantage of the data replication provided by a DSM in order to limit the amount of transferred pages during the checkpointing. The paper also presents an implementation and preliminary performances evaluation of our recoverable DSM on an Intel Paragon with 56 nodes. In particula..
A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability
: Large-scale distributed systems are very attractive for the execution of parallel applications requiring a huge computing power. However, their high probability of site failure is unacceptable, especially for long time running applications. In this paper, we address this problem and propose a checkpointing mechanism relying on a recoverable distributed shared memory (DSM). Although most recoverable DSM require specific hardware to store recovery data, our scheme uses standard memories to store both current and recovery data. Moreover, the management of recovery data is merged with the management of current data by extending the DSM's coherence protocol. This approach limits the hardware development and takes advantage of the data replication provided by a DSM in order to limit the amount of transferred pages during the checkpointing. The paper also presents an implementation and preliminary performances evaluation of our recoverable DSM on an Intel Paragon with 56 nodes. In particula..