Capability-based resiliency control of distributed database systems

Abstract

It is highly desirable for a distributed database system to achieve logically continuous operation even if some sites or message links fail. In this paper, we describe a scheme that can automati­cally reconfigure a fully-replicated distributed database system upon subsystem failures. The scheme can tolerate total failures of some sites. That is, some sites may lose their data completely including backup data. In order to handle this problem, we divide the execution of the system into generations. Each generation starts with a consistent database state and ends with a consistent database state. Serial­izability of the execution of each generation is enforced by con­currency control based on conditional capabilities. Conditional capabilities are valid only if they are being endorsed by a majority of sites in the system. Thus, any majority set of sites can invali­date lost or stray capabilities of old generations, and they can newly create capabilities for a new generation. Therefore, as long as there exist a majority of working sites that belong to the same partition, the system can continue its operation. The scheme allows a system to be reconfigured even if some transactions are being pro­cessed and even if other system reconfiguration attempts are under­way.Key Words and Phrases: distributed database system, resiliency con­trol, consistent database state, replicated data, conditional capa­bilitie

    Similar works