32 research outputs found
Archiving the Relaxed Consistency Web
The historical, cultural, and intellectual importance of archiving the web
has been widely recognized. Today, all countries with high Internet penetration
rate have established high-profile archiving initiatives to crawl and archive
the fast-disappearing web content for long-term use. As web technologies
evolve, established web archiving techniques face challenges. This paper
focuses on the potential impact of the relaxed consistency web design on
crawler driven web archiving. Relaxed consistent websites may disseminate,
albeit ephemerally, inaccurate and even contradictory information. If captured
and preserved in the web archives as historical records, such information will
degrade the overall archival quality. To assess the extent of such quality
degradation, we build a simplified feed-following application and simulate its
operation with synthetic workloads. The results indicate that a non-trivial
portion of a relaxed consistency web archive may contain observable
inconsistency, and the inconsistency window may extend significantly longer
than that observed at the data store. We discuss the nature of such quality
degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201
Harmony: Towards automated self-adaptive consistency in cloud storage
In just a few years cloud computing has become a very popular paradigm and a business success story, with storage being one of the key features. To achieve high data availability, cloud storage services rely on replication. In this context, one major challenge is data consistency. In contrast to traditional approaches that are mostly based on strong consistency, many cloud storage services opt for weaker consistency models in order to achieve better availability and performance. This comes at the cost of a high probability of stale data being read, as the replicas involved in the reads may not always have the most recent write. In this paper, we propose a novel approach, named Harmony, which adaptively tunes the consistency level at run-time according to the application requirements. The key idea behind Harmony is an intelligent estimation model of stale reads, allowing to elastically scale up or down the number of replicas involved in read operations to maintain a low (possibly zero) tolerable fraction of stale reads. As a result, Harmony can meet the desired consistency of the applications while achieving good performance. We have implemented Harmony and performed extensive evaluations with the Cassandra cloud storage on Grid?5000 testbed and on Amazon EC2. The results show that Harmony can achieve good performance without exceeding the tolerated number of stale reads. For instance, in contrast to the static eventual consistency used in Cassandra, Harmony reduces the stale data being read by almost 80% while adding only minimal latency. Meanwhile, it improves the throughput of the system by 45% while maintaining the desired consistency requirements of the applications when compared to the strong consistency model in Cassandra
WiSer: A Highly Available HTAP DBMS for IoT Applications
In a classic transactional distributed database management system (DBMS),
write transactions invariably synchronize with a coordinator before final
commitment. While enforcing serializability, this model has long been
criticized for not satisfying the applications' availability requirements. When
entering the era of Internet of Things (IoT), this problem has become more
severe, as an increasing number of applications call for the capability of
hybrid transactional and analytical processing (HTAP), where aggregation
constraints need to be enforced as part of transactions. Current systems work
around this by creating escrows, allowing occasional overshoots of constraints,
which are handled via compensating application logic.
The WiSer DBMS targets consistency with availability, by splitting the
database commit into two steps. First, a PROMISE step that corresponds to what
humans are used to as commitment, and runs without talking to a coordinator.
Second, a SERIALIZE step, that fixes transactions' positions in the
serializable order, via a consensus procedure. We achieve this split via a
novel data representation that embeds read-sets into transaction deltas, and
serialization sequence numbers into table rows. WiSer does no sharding (all
nodes can run transactions that modify the entire database), and yet enforces
aggregation constraints. Both readwrite conflicts and aggregation constraint
violations are resolved lazily in the serialized data. WiSer also covers node
joins and departures as database tables, thus simplifying correctness and
failure handling. We present the design of WiSer as well as experiments
suggesting this approach has promise
MDCC: Multi-Data Center Consistency
Replicating data across multiple data centers not only allows moving the data
closer to the user and, thus, reduces latency for applications, but also
increases the availability in the event of a data center failure. Therefore, it
is not surprising that companies like Google, Yahoo, and Netflix already
replicate user data across geographically different regions.
However, replication across data centers is expensive. Inter-data center
network delays are in the hundreds of milliseconds and vary significantly.
Synchronous wide-area replication is therefore considered to be unfeasible with
strong consistency and current solutions either settle for asynchronous
replication which implies the risk of losing data in the event of failures,
restrict consistency to small partitions, or give up consistency entirely. With
MDCC (Multi-Data Center Consistency), we describe the first optimistic commit
protocol, that does not require a master or partitioning, and is strongly
consistent at a cost similar to eventually consistent protocols. MDCC can
commit transactions in a single round-trip across data centers in the normal
operational case. We further propose a new programming model which empowers the
application developer to handle longer and unpredictable latencies caused by
inter-data center communication. Our evaluation using the TPC-W benchmark with
MDCC deployed across 5 geographically diverse data centers shows that MDCC is
able to achieve throughput and latency similar to eventually consistent quorum
protocols and that MDCC is able to sustain a data center outage without a
significant impact on response times while guaranteeing strong consistency
Space Complexity of Fault-Tolerant Register Emulations
Driven by the rising popularity of cloud storage, the costs associated with
implementing reliable storage services from a collection of fault-prone servers
have recently become an actively studied question. The well-known ABD result
shows that an f-tolerant register can be emulated using a collection of 2f + 1
fault-prone servers each storing a single read-modify-write object type, which
is known to be optimal. In this paper we generalize this bound: we investigate
the inherent space complexity of emulating reliable multi-writer registers as a
fucntion of the type of the base objects exposed by the underlying servers, the
number of writers to the emulated register, the number of available servers,
and the failure threshold. We establish a sharp separation between registers,
and both max-registers (the base object types assumed by ABD) and CAS in terms
of the resources (i.e., the number of base objects of the respective types)
required to support the emulation; we show that no such separation exists
between max-registers and CAS. Our main technical contribution is lower and
upper bounds on the resources required in case the underlying base objects are
fault-prone read/write registers. We show that the number of required registers
is directly proportional to the number of writers and inversely proportional to
the number of servers.Comment: Conference version appears in Proceedings of PODC '1