Oregon State University. Department of Computer Science
Abstract
It is highly desirable for a distributed database system to achieve logically continuous operation even if some sites or message links fail. In this paper, we describe a scheme that can automatically reconfigure a fully-replicated distributed database system upon subsystem failures. The scheme can tolerate total failures of some sites. That is, some sites may lose their data completely including backup data.
In order to handle this problem, we divide the execution of the system into generations. Each generation starts with a consistent database state and ends with a consistent database state. Serializability of the execution of each generation is enforced by concurrency control based on conditional capabilities. Conditional capabilities are valid only if they are being endorsed by a majority of sites in the system. Thus, any majority set of sites can invalidate lost or stray capabilities of old generations, and they can newly create capabilities for a new generation. Therefore, as long as there exist a majority of working sites that belong to the same partition, the system can continue its operation. The scheme allows a system to be reconfigured even if some transactions are being processed and even if other system reconfiguration attempts are underway.Key Words and Phrases: distributed database system, resiliency control, consistent database state, replicated data, conditional capabilitie