188 research outputs found

    Auditable and performant Byzantine consensus for permissioned ledgers

    Get PDF
    Permissioned ledgers allow users to execute transactions against a data store, and retain proof of their execution in a replicated ledger. Each replica verifies the transactions’ execution and ensures that, in perpetuity, a committed transaction cannot be removed from the ledger. Unfortunately, this is not guaranteed by today’s permissioned ledgers, which can be re-written if an arbitrary number of replicas collude. In addition, the transaction throughput of permissioned ledgers is low, hampering real-world deployments, by not taking advantage of multi-core CPUs and hardware accelerators. This thesis explores how permissioned ledgers and their consensus protocols can be made auditable in perpetuity; even when all replicas collude and re-write the ledger. It also addresses how Byzantine consensus protocols can be changed to increase the execution throughput of complex transactions. This thesis makes the following contributions: 1. Always auditable Byzantine consensus protocols. We present a permissioned ledger system that can assign blame to individual replicas regardless of how many of them misbehave. This is achieved by signing and storing consensus protocol messages in the ledger and providing clients with signed, universally-verifiable receipts. 2. Performant transaction execution with hardware accelerators. Next, we describe a cloud-based ML inference service that provides strong integrity guarantees, while staying compatible with current inference APIs. We change the Byzantine consensus protocol to execute machine learning (ML) inference computation on GPUs to optimize throughput and latency of ML inference computation. 3. Parallel transactions execution on multi-core CPUs. Finally, we introduce a permissioned ledger that executes transactions, in parallel, on multi-core CPUs. We separate the execution of transactions between the primary and secondary replicas. The primary replica executes transactions on multiple CPU cores and creates a dependency graph of the transactions that the backup replicas utilize to execute transactions in parallel.Open Acces

    Priority-Driven Differentiated Performance for NoSQL Database-As-a-Service

    Get PDF
    Designing data stores for native Cloud Computing services brings a number of challenges, especially if the Cloud Provider wants to offer database services capable of controlling the response time for specific customers. These requests may come from heterogeneous data-driven applications with conflicting responsiveness requirements. For instance, a batch processing workload does not require the same level of responsiveness as a time-sensitive one. Their coexistence may interfere with the responsiveness of the time-sensitive workload, such as online video gaming, virtual reality, and cloud-based machine learning. This paper presents a modification to the popular MongoDB NoSQL database to enable differentiated per-user/request performance on a priority basis by leveraging CPU scheduling and synchronization mechanisms available within the Operating System. This is achieved with minimally invasive changes to the source code and without affecting the performance and behavior of the database when the new feature is not in use. The proposed extension has been integrated with the access-control model of MongoDB for secure and controlled access to the new capability. Extensive experimentation with realistic workloads demonstrates how the proposed solution is able to reduce the response times for high-priority users/requests, with respect to lower-priority ones, in scenarios with mixed-priority clients accessing the data store

    Efficient Black-box Checking of Snapshot Isolation in Databases

    Full text link
    Snapshot isolation (SI) is a prevalent weak isolation level that avoids the performance penalty imposed by serializability and simultaneously prevents various undesired data anomalies. Nevertheless, SI anomalies have recently been found in production cloud databases that claim to provide the SI guarantee. Given the complex and often unavailable internals of such databases, a black-box SI checker is highly desirable. In this paper we present PolySI, a novel black-box checker that efficiently checks SI and provides understandable counterexamples upon detecting violations. PolySI builds on a novel characterization of SI using generalized polygraphs (GPs), for which we establish its soundness and completeness. PolySI employs an SMT solver and also accelerates SMT solving by utilizing the compact constraint encoding of GPs and domain-specific optimizations for pruning constraints. As demonstrated by our extensive assessment, PolySI successfully reproduces all of 2477 known SI anomalies, detects novel SI violations in three production cloud databases, identifies their causes, outperforms the state-of-the-art black-box checkers under a wide range of workloads, and can scale up to large-sized workloads.Comment: 20 pages, 15 figures, accepted by PVLD

    Modern data analytics in the cloud era

    Get PDF
    Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine

    Towards Scalable Real-time Analytics:: An Architecture for Scale-out of OLxP Workloads

    Get PDF
    We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers

    Dynamic Partial Order Reduction for Checking Correctness Against Transaction Isolation Levels

    Full text link
    Modern applications, such as social networking systems and e-commerce platforms are centered around using large-scale databases for storing and retrieving data. Accesses to the database are typically enclosed in transactions that allow computations on shared data to be isolated from other concurrent computations and resilient to failures. Modern databases trade isolation for performance. The weaker the isolation level is, the more behaviors a database is allowed to exhibit and it is up to the developer to ensure that their application can tolerate those behaviors. In this work, we propose stateless model checking algorithms for studying correctness of such applications that rely on dynamic partial order reduction. These algorithms work for a number of widely-used weak isolation levels, including Read Committed, Causal Consistency, Snapshot Isolation, and Serializability. We show that they are complete, sound and optimal, and run with polynomial memory consumption in all cases. We report on an implementation of these algorithms in the context of Java Pathfinder applied to a number of challenging applications drawn from the literature of distributed systems and databases.Comment: Submission to PLDI 202

    Eventual Durability of ACID Transactions in Database Systems

    Get PDF
    Modern database systems that support ACID transactions, and applications built around these databases, may choose to sacrifice transaction durability for performance when they deem it necessary. While this approach may yield good performance, it has three major downsides. Firstly, users are often not provided information about when and if the issued transactions become durable. Secondly, users cannot know if durable and non-durable transactions see each other’s effects. Finally, this approach pushes durability handling outside the scope of the transactional model, making it difficult for applications to reason about correctness and data consistency. To address these issues, we present the idea of “Eventual Durability” (ED) to provide a principled way for applications to manage transaction durability trade-offs. The ED model extends the traditional transaction model by decoupling a transaction’s commit point from its durability point – therefore, allowing applications to control which transactions should be acknowledged at commit point and which ones at their durability point. Furthermore, we redefine serialisability and recoverability under ED to allow applications to ascertain if fast transactions became durable and how they might have interacted with safe ones. With ED, users and applications can know what to expect to lose when there is a failure – thus, bringing back managing durability inside the transaction model. We implement the ED model in PostgreSQL and evaluate it to understand the model’s effect on transaction latency, abort rates and throughput. We show that ED Postgres achieves significant latency improvements even while ensuring the guarantees provided by the model. Since a transaction’s resources are released earlier in ED Postgres, we expected to see lower abort rates and higher throughput. Consequently, we observed that ED Postgres provides an average of 91.25% – 93% reduction in abort rates under a contentious workload and an average of 75% increase in throughput compared to baseline Postgres. We also run the TPC-C benchmark against ED Postgres and discuss the findings. Lastly, we discuss how ED Postgres can be used in realistic settings to obtain latency benefits, throughput improvements, reduced abort rates, and fresher reads

    GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database

    Full text link
    Multinational enterprises conduct global business that has a demand for geo-distributed transactional databases. Existing state-of-the-art databases adopt a sharded master-follower replication architecture. However, the single-master serving mode incurs massive cross-region writes from clients, and the sharded architecture requires multiple round-trip acknowledgments (e.g., 2PC) to ensure atomicity for cross-shard transactions. These limitations drive us to seek yet another design choice. In this paper, we propose a strongly consistent OLTP database GeoGauss with full replica multi-master architecture. To efficiently merge the updates from different master nodes, we propose a multi-master OCC that unifies data replication and concurrent transaction processing. By leveraging an epoch-based delta state merge rule and the optimistic asynchronous execution, GeoGauss ensures strong consistency with light-coordinated protocol and allows more concurrency with weak isolation, which are sufficient to meet our needs. Our geo-distributed experimental results show that GeoGauss achieves 7.06X higher throughput and 17.41X lower latency than the state-of-the-art geo-distributed database CockroachDB on the TPC-C benchmark

    Coordination between multiple microservices: a systematic mapping study

    Get PDF
    Mikropalveluarkkitehtuurin suosio on kasvanut huomattavisti viimeisen kymmenen vuoden aikana sen tarjoamien hyötyjen takia, jotka ovat osittain seurausta palveluiden vähentyneestä riippuvuudesta toisiinsa. Riippumattomuutta lisää esimerkiksi hajautettu tiedonhallinta, jonka mukaan jokaisen palvelun tulisi olla vastuussa omistamansa tiedon hallinasta käyttäen sopivinta tietokantateknologiaa. Vaikka tällä voidaan saavuttaa useita etuja, aiheuttaa se myös uusia ongelmia etenkin tiedon yhtenäisyyden hallinnassa kun usean palvelun hallitsemaa tietoa täytyy muokata yhteistyössä. Tämä ongelma voitaisiin välttää käyttämällä yhteistä tietokantaa palveluiden välillä, mutta se osittain poistaisi mikropalvelun hyödyt tuomalla lisää riippuvuusuhteita mikropalveluiden välille. Tästä syystä on tärkeää tarkastella muita vaihtoehtoja hajautetun tiedon hallintaan siten, että mikropalvelun hyötyjä on mahdollista ylläpitää. Tässä työssä toteutetaan systemaattinen kirjallisuuskartoitus, jonka tavoitteena on löytää sopivia malleja usean mikropalvelun väliseen koordinointiin. Aluksi työssä tunnistetaan koordinointimallit, joista käydään paljon keskustelua kirjallisuudessa. Tämän jälkeen jokaisesta valitusta mallista keskustellaan käyttäen yhteistä keskustelukaavaa, joka sisältää mallin määrittelyn sekä hyötyjen ja haittojen listaamisen. Kirjallisuuskartoituksessa saatujen tulosten perusteella huomattiin, että mikropalveluarkkitehtuurissa suositaan malleja jotka tarjoavat lopulta yhtenäistä tulosta (eng. eventual consistent). Tämä eroaa huomattavasti perinteisistä ohjelmistoista, joissa yhtenäisyyden täytyy olla ehdoton ja toteuttaa kaikki ACID periaatteet. Ero johtuu osittain siitä, että mallit joilla voidaan tarjota ehdoton johdonmukaisuus usean palvelun välillä vähentää mahdollisuutta rinnakkaisuudelle ja lisäksi vaikuttaa palveluiden saavutettavuuteen heikentävästi. Tästä syystä mikropalveluarkkitehtuurissa usein luovutaan ehdottomasta yhtenäisyydestä, koska sen seuraksena voidaan saavuttaa korkeampi suorituskyky ja lisääntynyt saavutettavuus. Etenkin saga-mallin havaittiin olevan suosittu yhtenäisyyden hallintaan mikropalveluiden välillä, koska siitä keskusteltiin ja siihen liittyviä parannusehdotuksia ja toteutustapoja ehdotettiin useissa töissä. Vaikkakin saga-malli on tällä hetkellä yleisesti käytetty tapa mikropalveluiden välisessä koordinoinnissa, valitusta kirjallisuudesta huomattiin myös tarve ehdottoman johdonmukaisuuden toteuttaville malleille. Useita uusia malleja ehdotettiin ratkaisemaan tämänhetkisissä ratkaisuissa olevia ongelmia, mutta myös ratkaisuja joilla voitaisiin poistaa tarve usean palvelun väliseen koordinointiin ehdotettiin. Vaikka ehdotetut mallit ovatkin lupaavia, ne ovat vasta suunnitteluvaiheessa eikä niitä voida käyttää luotettavasti tai helposti teollisuusympäristössä. Tästä syystä lisätutkimuksia tarvitaan näiden uusien mallien jalostamiseen tai kokonaan uusien mallien visiontiin.The popularity of microservice architecture has risen recently due to its multiple advantages partly related to the increased independence of services. One of the features that improve independence is decentralized data management, which outlines that each service should manage its own data with preferred data management technologies. However, the usage of decentralized data management brings problems, especially with data consistency when data owned by separate microservices must be modified in coordination. To alleviate this, a shared database between services could be used as it removes the need for coordination altogether, but then again, the usage of a single database could defeat some of the benefits of microservice architecture by increasing tight coupling between services. Therefore, it is important to consider other possibilities to manage the coordination while maintaining the independence of the services. We conducted a systematic mapping study to find out suitable design patterns to manage the coordination between multiple microservices. Firstly, design patterns that seemed widely discussed and adopted were identified. After this, these patterns were presented using a template that included advantages and disadvantages for each pattern. The results gathered in the systematic mapping study show that even though traditional systems pursue strict consistency with ACID guarantees, eventual consistency patterns, such as the saga pattern, seem to be more popular in the microservice environment. This is due to drawbacks within distributed transaction protocols including limited concurrency and reduced availability which makes developers choose loosened consistency as a trade-off for higher availability and increased performance. The prevalence of the saga pattern can be seen in the selected works as there are multiple articles proposing methods to manage different parts of the pattern. Also, implementation details were mainly related to the saga pattern in the selected works. Even though the saga pattern is currently the most prevalent option, there is still interest in highly consistent coordination methods in the research community. Multiple solutions have been proposed, which either propose new consistency protocols with strict consistency guarantees or entirely new solutions to remove the need for coordination completely. However, there are no novel solutions that could manage the requirements of microservice architecture reliably in the industry setting yet. Therefore, further research is still required to refine already proposed solutions or to vision new solutions for this problem
    corecore