188 research outputs found
Auditable and performant Byzantine consensus for permissioned ledgers
Permissioned ledgers allow users to execute transactions against a data store, and retain proof of their execution in a replicated ledger. Each replica verifies the transactions’ execution and ensures that, in perpetuity, a committed transaction cannot be removed from the ledger. Unfortunately, this is not guaranteed by today’s permissioned ledgers, which can be re-written if an arbitrary number of replicas collude. In addition, the transaction throughput of permissioned ledgers is low, hampering real-world deployments, by not taking advantage of multi-core CPUs and hardware accelerators.
This thesis explores how permissioned ledgers and their consensus protocols can be made auditable in perpetuity; even when all replicas collude and re-write the ledger. It also addresses how Byzantine consensus protocols can be changed to increase the execution throughput of complex transactions. This thesis makes the following contributions:
1. Always auditable Byzantine consensus protocols. We present a permissioned ledger system that can assign blame to individual replicas regardless of how many of them misbehave. This is achieved by signing and storing consensus protocol messages in the ledger and providing clients with signed, universally-verifiable receipts.
2. Performant transaction execution with hardware accelerators. Next, we describe a cloud-based ML inference service that provides strong integrity guarantees, while staying compatible with current inference APIs. We change the Byzantine consensus protocol to execute machine learning (ML) inference computation on GPUs to optimize throughput and latency of ML inference computation.
3. Parallel transactions execution on multi-core CPUs. Finally, we introduce a permissioned ledger that executes transactions, in parallel, on multi-core CPUs. We separate the execution of transactions between the primary and secondary replicas. The primary replica executes transactions on multiple CPU cores and creates a dependency graph of the transactions that the backup replicas utilize to execute transactions in parallel.Open Acces
Priority-Driven Differentiated Performance for NoSQL Database-As-a-Service
Designing data stores for native Cloud Computing services brings a number of challenges, especially if the Cloud Provider wants to offer database services capable of controlling the response time for specific customers. These requests may come from heterogeneous data-driven applications with conflicting responsiveness requirements. For instance, a batch processing workload does not require the same level of responsiveness as a time-sensitive one. Their coexistence may interfere with the responsiveness of the time-sensitive workload, such as online video gaming, virtual reality, and cloud-based machine learning. This paper presents a modification to the popular MongoDB NoSQL database to enable differentiated per-user/request performance on a priority basis by leveraging CPU scheduling and synchronization mechanisms available within the Operating System. This is achieved with minimally invasive changes to the source code and without affecting the performance and behavior of the database when the new feature is not in use. The proposed extension has been integrated with the access-control model of MongoDB for secure and controlled access to the new capability. Extensive experimentation with realistic workloads demonstrates how the proposed solution is able to reduce the response times for high-priority users/requests, with respect to lower-priority ones, in scenarios with mixed-priority clients accessing the data store
Efficient Black-box Checking of Snapshot Isolation in Databases
Snapshot isolation (SI) is a prevalent weak isolation level that avoids the
performance penalty imposed by serializability and simultaneously prevents
various undesired data anomalies. Nevertheless, SI anomalies have recently been
found in production cloud databases that claim to provide the SI guarantee.
Given the complex and often unavailable internals of such databases, a
black-box SI checker is highly desirable.
In this paper we present PolySI, a novel black-box checker that efficiently
checks SI and provides understandable counterexamples upon detecting
violations. PolySI builds on a novel characterization of SI using generalized
polygraphs (GPs), for which we establish its soundness and completeness. PolySI
employs an SMT solver and also accelerates SMT solving by utilizing the compact
constraint encoding of GPs and domain-specific optimizations for pruning
constraints. As demonstrated by our extensive assessment, PolySI successfully
reproduces all of 2477 known SI anomalies, detects novel SI violations in three
production cloud databases, identifies their causes, outperforms the
state-of-the-art black-box checkers under a wide range of workloads, and can
scale up to large-sized workloads.Comment: 20 pages, 15 figures, accepted by PVLD
Modern data analytics in the cloud era
Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine
Towards Scalable Real-time Analytics:: An Architecture for Scale-out of OLxP Workloads
We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers
Dynamic Partial Order Reduction for Checking Correctness Against Transaction Isolation Levels
Modern applications, such as social networking systems and e-commerce
platforms are centered around using large-scale databases for storing and
retrieving data. Accesses to the database are typically enclosed in
transactions that allow computations on shared data to be isolated from other
concurrent computations and resilient to failures. Modern databases trade
isolation for performance. The weaker the isolation level is, the more
behaviors a database is allowed to exhibit and it is up to the developer to
ensure that their application can tolerate those behaviors.
In this work, we propose stateless model checking algorithms for studying
correctness of such applications that rely on dynamic partial order reduction.
These algorithms work for a number of widely-used weak isolation levels,
including Read Committed, Causal Consistency, Snapshot Isolation, and
Serializability. We show that they are complete, sound and optimal, and run
with polynomial memory consumption in all cases. We report on an implementation
of these algorithms in the context of Java Pathfinder applied to a number of
challenging applications drawn from the literature of distributed systems and
databases.Comment: Submission to PLDI 202
Eventual Durability of ACID Transactions in Database Systems
Modern database systems that support ACID transactions, and applications built around
these databases, may choose to sacrifice transaction durability for performance when they
deem it necessary. While this approach may yield good performance, it has three major
downsides. Firstly, users are often not provided information about when and if the issued
transactions become durable. Secondly, users cannot know if durable and non-durable
transactions see each other’s effects. Finally, this approach pushes durability handling
outside the scope of the transactional model, making it difficult for applications to reason
about correctness and data consistency.
To address these issues, we present the idea of “Eventual Durability” (ED) to provide a
principled way for applications to manage transaction durability trade-offs. The ED model
extends the traditional transaction model by decoupling a transaction’s commit point from
its durability point – therefore, allowing applications to control which transactions should
be acknowledged at commit point and which ones at their durability point. Furthermore,
we redefine serialisability and recoverability under ED to allow applications to ascertain
if fast transactions became durable and how they might have interacted with safe ones.
With ED, users and applications can know what to expect to lose when there is a failure
– thus, bringing back managing durability inside the transaction model.
We implement the ED model in PostgreSQL and evaluate it to understand the model’s
effect on transaction latency, abort rates and throughput. We show that ED Postgres
achieves significant latency improvements even while ensuring the guarantees provided by
the model. Since a transaction’s resources are released earlier in ED Postgres, we expected
to see lower abort rates and higher throughput. Consequently, we observed that ED
Postgres provides an average of 91.25% – 93% reduction in abort rates under a contentious
workload and an average of 75% increase in throughput compared to baseline Postgres.
We also run the TPC-C benchmark against ED Postgres and discuss the findings. Lastly,
we discuss how ED Postgres can be used in realistic settings to obtain latency benefits,
throughput improvements, reduced abort rates, and fresher reads
GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database
Multinational enterprises conduct global business that has a demand for
geo-distributed transactional databases. Existing state-of-the-art databases
adopt a sharded master-follower replication architecture. However, the
single-master serving mode incurs massive cross-region writes from clients, and
the sharded architecture requires multiple round-trip acknowledgments (e.g.,
2PC) to ensure atomicity for cross-shard transactions. These limitations drive
us to seek yet another design choice. In this paper, we propose a strongly
consistent OLTP database GeoGauss with full replica multi-master architecture.
To efficiently merge the updates from different master nodes, we propose a
multi-master OCC that unifies data replication and concurrent transaction
processing. By leveraging an epoch-based delta state merge rule and the
optimistic asynchronous execution, GeoGauss ensures strong consistency with
light-coordinated protocol and allows more concurrency with weak isolation,
which are sufficient to meet our needs. Our geo-distributed experimental
results show that GeoGauss achieves 7.06X higher throughput and 17.41X lower
latency than the state-of-the-art geo-distributed database CockroachDB on the
TPC-C benchmark
Coordination between multiple microservices: a systematic mapping study
Mikropalveluarkkitehtuurin suosio on kasvanut huomattavisti viimeisen kymmenen vuoden aikana sen tarjoamien hyötyjen takia, jotka ovat osittain seurausta palveluiden vähentyneestä riippuvuudesta toisiinsa. Riippumattomuutta lisää esimerkiksi hajautettu tiedonhallinta, jonka mukaan jokaisen palvelun tulisi olla vastuussa omistamansa tiedon hallinasta käyttäen sopivinta tietokantateknologiaa. Vaikka tällä voidaan saavuttaa useita etuja, aiheuttaa se myös uusia ongelmia etenkin tiedon yhtenäisyyden hallinnassa kun usean palvelun hallitsemaa tietoa täytyy muokata yhteistyössä. Tämä ongelma voitaisiin välttää käyttämällä yhteistä tietokantaa palveluiden välillä, mutta se osittain poistaisi mikropalvelun hyödyt tuomalla lisää riippuvuusuhteita mikropalveluiden välille. Tästä syystä on tärkeää tarkastella muita vaihtoehtoja hajautetun tiedon hallintaan siten, että mikropalvelun hyötyjä on mahdollista ylläpitää.
Tässä työssä toteutetaan systemaattinen kirjallisuuskartoitus, jonka tavoitteena on löytää sopivia malleja usean mikropalvelun väliseen koordinointiin. Aluksi työssä tunnistetaan koordinointimallit, joista käydään paljon keskustelua kirjallisuudessa. Tämän jälkeen jokaisesta valitusta mallista keskustellaan käyttäen yhteistä keskustelukaavaa, joka sisältää mallin määrittelyn sekä hyötyjen ja haittojen listaamisen.
Kirjallisuuskartoituksessa saatujen tulosten perusteella huomattiin, että mikropalveluarkkitehtuurissa suositaan malleja jotka tarjoavat lopulta yhtenäistä tulosta (eng. eventual consistent). Tämä eroaa huomattavasti perinteisistä ohjelmistoista, joissa yhtenäisyyden täytyy olla ehdoton ja toteuttaa kaikki ACID periaatteet. Ero johtuu osittain siitä, että mallit joilla voidaan tarjota ehdoton johdonmukaisuus usean palvelun välillä vähentää mahdollisuutta rinnakkaisuudelle ja lisäksi vaikuttaa palveluiden saavutettavuuteen heikentävästi. Tästä syystä mikropalveluarkkitehtuurissa usein luovutaan ehdottomasta yhtenäisyydestä, koska sen seuraksena voidaan saavuttaa korkeampi suorituskyky ja lisääntynyt saavutettavuus. Etenkin saga-mallin havaittiin olevan suosittu yhtenäisyyden hallintaan mikropalveluiden välillä, koska siitä keskusteltiin ja siihen liittyviä parannusehdotuksia ja toteutustapoja ehdotettiin useissa töissä.
Vaikkakin saga-malli on tällä hetkellä yleisesti käytetty tapa mikropalveluiden välisessä koordinoinnissa, valitusta kirjallisuudesta huomattiin myös tarve ehdottoman johdonmukaisuuden toteuttaville malleille. Useita uusia malleja ehdotettiin ratkaisemaan tämänhetkisissä ratkaisuissa olevia ongelmia, mutta myös ratkaisuja joilla voitaisiin poistaa tarve usean palvelun väliseen koordinointiin ehdotettiin. Vaikka ehdotetut mallit ovatkin lupaavia, ne ovat vasta suunnitteluvaiheessa eikä niitä voida käyttää luotettavasti tai helposti teollisuusympäristössä. Tästä syystä lisätutkimuksia tarvitaan näiden uusien mallien jalostamiseen tai kokonaan uusien mallien visiontiin.The popularity of microservice architecture has risen recently due to its multiple advantages partly related to the increased independence of services. One of the features that improve independence is decentralized data management, which outlines that each service should manage its own data with preferred data management technologies. However, the usage of decentralized data management brings problems, especially with data consistency when data owned by separate microservices must be modified in coordination. To alleviate this, a shared database between services could be used as it removes the need for coordination altogether, but then again, the usage of a single database could defeat some of the benefits of microservice architecture by increasing tight coupling between services. Therefore, it is important to consider other possibilities to manage the coordination while maintaining the independence of the services.
We conducted a systematic mapping study to find out suitable design patterns to manage the coordination between multiple microservices. Firstly, design patterns that seemed widely discussed and adopted were identified. After this, these patterns were presented using a template that included advantages and disadvantages for each pattern.
The results gathered in the systematic mapping study show that even though traditional systems pursue strict consistency with ACID guarantees, eventual consistency patterns, such as the saga pattern, seem to be more popular in the microservice environment. This is due to drawbacks within distributed transaction protocols including limited concurrency and reduced availability which makes developers choose loosened consistency as a trade-off for higher availability and increased performance. The prevalence of the saga pattern can be seen in the selected works as there are multiple articles proposing methods to manage different parts of the pattern. Also, implementation details were mainly related to the saga pattern in the selected works.
Even though the saga pattern is currently the most prevalent option, there is still interest in highly consistent coordination methods in the research community. Multiple solutions have been proposed, which either propose new consistency protocols with strict consistency guarantees or entirely new solutions to remove the need for coordination completely. However, there are no novel solutions that could manage the requirements of microservice architecture reliably in the industry setting yet. Therefore, further research is still required to refine already proposed solutions or to vision new solutions for this problem
- …