82 research outputs found
Consensus on Transaction Commit
The distributed transaction commit problem requires reaching agreement on
whether a transaction is committed or aborted. The classic Two-Phase Commit
protocol blocks if the coordinator fails. Fault-tolerant consensus algorithms
also reach agreement, but do not block whenever any majority of the processes
are working. Running a Paxos consensus algorithm on the commit/abort decision
of each participant yields a transaction commit protocol that uses 2F +1
coordinators and makes progress if at least F +1 of them are working. In the
fault-free case, this algorithm requires one extra message delay but has the
same stable-storage write delay as Two-Phase Commit. The classic Two-Phase
Commit algorithm is obtained as the special F = 0 case of the general Paxos
Commit algorithm.Comment: Original at
http://research.microsoft.com/research/pubs/view.aspx?tr_id=70
Atomic commitment in transactional DHTs
We investigate the problem of atomic commit in transactional database systems
built on top of Distributed Hash Tables. DHTs provide a decentralized way to
store and look up data. To solve the atomic commit problem we propose to
use an adaption of Paxos commit as a non-blocking algorithm. We exploit the
symmetric replication technique existing in the DKS DHT to determine which
nodes are necessary to execute the commit algorithm. By doing so we achieve a
lower number of communication rounds and a reduction of meta-data in contrast
to traditional Three-Phase-Commit protocols. We also show how the proposed
solution can cope with dynamism due to churn in DHTs. Our solution works
correctly relying only on an inaccurate failure detection of node failure which is
necessary for systems running over the Internet
CRDTs: Consistency without concurrency control
A CRDT is a data type whose operations commute when they are concurrent.
Replicas of a CRDT eventually converge without any complex concurrency control.
As an existence proof, we exhibit a non-trivial CRDT: a shared edit buffer
called Treedoc. We outline the design, implementation and performance of
Treedoc. We discuss how the CRDT concept can be generalised, and its
limitations
Open Transactions on Shared Memory
Transactional memory has arisen as a good way for solving many of the issues
of lock-based programming. However, most implementations admit isolated
transactions only, which are not adequate when we have to coordinate
communicating processes. To this end, in this paper we present OCTM, an
Haskell-like language with open transactions over shared transactional memory:
processes can join transactions at runtime just by accessing to shared
variables. Thus a transaction can co-operate with the environment through
shared variables, but if it is rolled-back, also all its effects on the
environment are retracted. For proving the expressive power of TCCS we give an
implementation of TCCS, a CCS-like calculus with open transactions
Designing a commutative replicated data type
Commuting operations greatly simplify consistency in distributed systems.
This paper focuses on designing for commutativity, a topic neglected
previously. We show that the replicas of \emph{any} data type for which
concurrent operations commute converges to a correct value, under some simple
and standard assumptions. We also show that such a data type supports
transactions with very low cost. We identify a number of approaches and
techniques to ensure commutativity. We re-use some existing ideas
(non-destructive updates coupled with invariant identification), but propose a
much more efficient implementation. Furthermore, we propose a new technique,
background consensus. We illustrate these ideas with a shared edit buffer data
type
Исключение блокирования общего ресурса в распределённых системах
The article provides asynchronous algorithm for distributed transactions, which excludes share blocking during transaction execution and commit. This approach significantly increases the throughput of distributed systems.В статье рассматривается алгоритм реализации распределенных транзакций, исключающий блокирование общего ресурса при выполнении и фиксации транзакции. Такой подход существенно улучшает пропускную способность распределенных систем
MDCC: Multi-Data Center Consistency
Replicating data across multiple data centers not only allows moving the data
closer to the user and, thus, reduces latency for applications, but also
increases the availability in the event of a data center failure. Therefore, it
is not surprising that companies like Google, Yahoo, and Netflix already
replicate user data across geographically different regions.
However, replication across data centers is expensive. Inter-data center
network delays are in the hundreds of milliseconds and vary significantly.
Synchronous wide-area replication is therefore considered to be unfeasible with
strong consistency and current solutions either settle for asynchronous
replication which implies the risk of losing data in the event of failures,
restrict consistency to small partitions, or give up consistency entirely. With
MDCC (Multi-Data Center Consistency), we describe the first optimistic commit
protocol, that does not require a master or partitioning, and is strongly
consistent at a cost similar to eventually consistent protocols. MDCC can
commit transactions in a single round-trip across data centers in the normal
operational case. We further propose a new programming model which empowers the
application developer to handle longer and unpredictable latencies caused by
inter-data center communication. Our evaluation using the TPC-W benchmark with
MDCC deployed across 5 geographically diverse data centers shows that MDCC is
able to achieve throughput and latency similar to eventually consistent quorum
protocols and that MDCC is able to sustain a data center outage without a
significant impact on response times while guaranteeing strong consistency
Chainspace: A Sharded Smart Contracts Platform
Chainspace is a decentralized infrastructure, known as a distributed ledger,
that supports user defined smart contracts and executes user-supplied
transactions on their objects. The correct execution of smart contract
transactions is verifiable by all. The system is scalable, by sharding state
and the execution of transactions, and using S-BAC, a distributed commit
protocol, to guarantee consistency. Chainspace is secure against subsets of
nodes trying to compromise its integrity or availability properties through
Byzantine Fault Tolerance (BFT), and extremely high-auditability,
non-repudiation and `blockchain' techniques. Even when BFT fails, auditing
mechanisms are in place to trace malicious participants. We present the design,
rationale, and details of Chainspace; we argue through evaluating an
implementation of the system about its scaling and other features; we
illustrate a number of privacy-friendly smart contracts for smart metering,
polling and banking and measure their performance
- …