3,259 research outputs found
Estimating data divergence in cloud computing storage systems
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaMany internet services are provided through cloud computing infrastructures that
are composed of multiple data centers. To provide high availability and low latency, data is replicated in machines in different data centers, which introduces the complexity of guaranteeing that clients view data consistently. Data stores often opt for a relaxed approach to replication, guaranteeing only eventual consistency, since it improves latency of operations. However, this may lead to replicas having different values for the same data.
One solution to control the divergence of data in eventually consistent systems is
the usage of metrics that measure how stale data is for a replica. In the past, several
algorithms have been proposed to estimate the value of these metrics in a deterministic
way. An alternative solution is to rely on probabilistic metrics that estimate divergence with a certain degree of certainty. This relaxes the need to contact all replicas while still providing a relatively accurate measurement.
In this work we designed and implemented a solution to estimate the divergence of
data in eventually consistent data stores, that scale to many replicas by allowing clientside caching. Measuring the divergence when there is a large number of clients calls for the development of new algorithms that provide probabilistic guarantees. Additionally, unlike previous works, we intend to focus on measuring the divergence relative to a state that can lead to the violation of application invariants.Partially funded by project PTDC/EIA EIA/108963/2008 and by an ERC Starting Grant, Agreement Number 30773
Quantifying Eventual Consistency with PBS
Data replication results in a fundamental trade-off between operation latency and consistency. At the weak end of the spectrum of possible consistency models is eventual consistency, which provides no limit to the staleness of data returned. However, anecdotally, eventual consistency is often “good enough ” for practitioners given its latency and availability benefits. In this work, we explain this phenomenon and demonstrate that, despite their weak guarantees, eventually consistent systems regularly return consistent data while providing lower latency than their strongly consistent counterparts. To quantify the behavior of eventually consistent stores, we introduce Probabilistically Bounded Staleness (PBS), a consistency model that provides expected bounds on data staleness with respect to both versions and wall clock time. We derive a closed-form solution for version-based staleness and model real-time staleness for a large class of quorum replicated, Dynamo-style stores. Using PBS, we measure the trade-off between latency and consistency for partial, non-overlapping quorum systems under Internet production workloads. We quantitatively demonstrate how and why eventually consistent systems frequently return consistent data within tens of milliseconds while offering large latency benefits. 1
SUPPORTING MULTIPLE ISOLATION LEVELS IN REPLICATED ENVIRONMENTS
La replicación de bases de datos aporta fiabilidad y escalabilidad aunque hacerlo
de forma transparente no es una tarea sencilla. Una base de datos replicada es
transparente si puede reemplazar a una base de datos centralizada tradicional sin
que sea necesario adaptar el resto de componentes del sistema. La transparencia
en bases de datos replicadas puede obtenerse siempre que (a) la gestión de la
replicación quede totalmente oculta a dichos componentes y (b) se ofrezca la
misma funcionalidad que en una base de datos tradicional.
Para mejorar el rendimiento general del sistema, los gestores de bases de datos
centralizadas actuales permiten ejecutar de forma concurrente transacciones
bajo distintos niveles de aislamiento. Por ejemplo, la especificación del benchmark
TPC-C permite la ejecución de algunas transacciones con niveles de aislamiento
débiles. No obstante, este soporte todavía no está disponible en los
protocolos de replicación. En esta tesis mostramos cómo estos protocolos pueden
ser extendidos para permitir la ejecución de transacciones con distintos niveles
de aislamiento.Bernabe Gisbert, JM. (2014). SUPPORTING MULTIPLE ISOLATION LEVELS IN REPLICATED ENVIRONMENTS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/36535TESI
Partial replication in distributed software transactional memory
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaDistributed software transactional memory (DSTM) is emerging as an interesting alternative for distributed concurrency control. Usually, DSTM systems resort to data distribution and full replication techniques in order to provide scalability and fault tolerance.
Nevertheless, distribution does not provide support for fault tolerance and full
replication limits the system’s total storage capacity. In this context, partial data replication rises as an intermediate solution that combines the best of the previous two trying to mitigate their disadvantages. This strategy has been explored by the distributed databases research field, but has been little addressed in the context of transactional memory and, to the best of our knowledge, it has never before been incorporated into a DSTM system for a general-purpose programming language. Thus, we defend the claim that it is possible to combine both full and partial data replication in such systems.
Accordingly, we developed a prototype of a DSTM system combining full and partial data replication for Java programs. We built from an existent DSTM framework and extended it with support for partial data replication. With the proposed framework, we implemented a partially replicated DSTM.
We evaluated the proposed system using known benchmarks, and the evaluation showcases the existence of scenarios where partial data replication can be advantageous, e.g., in scenarios with small amounts of transactions modifying fully replicated data.
The results of this thesis show that we were able to sustain our claim by implementing
a prototype that effectively combines full and partial data replication in a DSTM system.
The modularity of the presented framework allows the easy implementation of its various
components, and it provides a non-intrusive interface to applications.Fundação para a Ciência e Tecnologia - (FCT/MCTES) in the scope of the research project PTDC/EIA-EIA/113613/2009 (Synergy-VM
- …