83 research outputs found

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts

    Towards Transaction as a Service

    Full text link
    This paper argues for decoupling transaction processing from existing two-layer cloud-native databases and making transaction processing as an independent service. By building a transaction as a service (TaaS) layer, the transaction processing can be independently scaled for high resource utilization and can be independently upgraded for development agility. Accordingly, we architect an execution-transaction-storage three-layer cloud-native database. By connecting to TaaS, 1) the AP engines can be empowered with ACID TP capability, 2) multiple standalone TP engine instances can be incorporated to support multi-master distributed TP for horizontal scalability, 3) multiple execution engines with different data models can be integrated to support multi-model transactions, and 4) high performance TP is achieved through extensive TaaS optimizations and consistent evolution. Cloud-native databases deserve better architecture: we believe that TaaS provides a path forward to better cloud-native databases

    Transaction Scheduling: From Conflicts to Runtime Conflicts

    Get PDF

    Merging Queries in OLTP Workloads

    Get PDF
    OLTP applications are usually executed by a high number of clients in parallel and are typically faced with high throughput demand as well as a constraint latency requirement for individual statements. In enterprise scenarios, they often face the challenge to deal with overload spikes resulting from events such as Cyber Monday or Black Friday. The traditional solution to prevent running out of resources and thus coping with such spikes is to use a significant over-provisioning of the underlying infrastructure. In this thesis, we analyze real enterprise OLTP workloads with respect to statement types, complexity, and hot-spot statements. Interestingly, our findings reveal that workloads are often read-heavy and comprise similar query patterns, which provides a potential to share work of statements belonging to different transactions. In the past, resource sharing has been extensively studied for OLAP workloads. Naturally, the question arises, why studies mainly focus on OLAP and not on OLTP workloads? At first sight, OLTP queries often consist of simple calculations, such as index look-ups with little sharing potential. In consequence, such queries – due to their short execution time – may not have enough potential for the additional overhead. In addition, OLTP workloads do not only execute read operations but also updates. Therefore, sharing work needs to obey transactional semantics, such as the given isolation level and read-your-own-writes. This thesis presents THE LEVIATHAN, a novel batching scheme for OLTP workloads, an approach for merging read statements within interactively submitted multi-statement transactions consisting of reads and updates. Our main idea is to merge the execution of statements by merging their plans, thus being able to merge the execution of not only complex, but also simple calculations, such as the aforementioned index look-up. We identify mergeable statements by pattern matching of prepared statement plans, which comes with low overhead. For obeying the isolation level properties and providing read-your-own-writes, we first define a formal framework for merging transactions running under a given isolation level and provide insights into a prototypical implementation of merging within a commercial database system. Our experimental evaluation shows that, depending on the isolation level, the load in the system, and the read-share of the workload, an improvement of the transaction throughput by up to a factor of 2.5x is possible without compromising the transactional semantics. Another interesting effect we show is that with our strategy, we can increase the throughput of a real enterprise workload by 20%.:1 INTRODUCTION 1.1 Summary of Contributions 1.2 Outline 2 WORKLOAD ANALYSIS 2.1 Analyzing OLTP Benchmarks 2.1.1 YCSB 2.1.2 TATP 2.1.3 TPC Benchmark Scenarios 2.1.4 Summary 2.2 Analyzing OLTP Workloads from Open Source Projects 2.2.1 Characteristics of Workloads 2.2.2 Summary 2.3 Analyzing Enterprise OLTP Workloads 2.3.1 Overview of Reports about OLTP Workload Characteristics 2.3.2 Analysis of SAP Hybris Workload 2.3.3 Summary 2.4 Conclusion 3 RELATED WORK ON QUERY MERGING 3.1 Merging the Execution of Operators 3.2 Merging the Execution of Subplans 3.3 Merging the Results of Subplans 3.4 Merging the Execution of Full Plans 3.5 Miscellaneous Works on Merging 3.6 Discussion 4 MERGING STATEMENTS IN MULTI STATEMENT TRANSACTIONS 4.1 Overview of Our Approach 4.1.1 Examples 4.1.2 Why Naïve Merging Fails 4.2 THE LEVIATHAN Approach 4.3 Formalizing THE LEVIATHAN Approach 4.3.1 Transaction Theory 4.3.2 Merging Under MVCC 4.4 Merging Reads Under Different Isolation Levels 4.4.1 Read Uncommitted 4.4.2 Read Committed 4.4.3 Repeatable Read 4.4.4 Snapshot Isolation 4.4.5 Serializable 4.4.6 Discussion 4.5 Merging Writes Under Different Isolation Levels 4.5.1 Read Uncommitted 4.5.2 Read Committed 4.5.3 Snapshot Isolation 4.5.4 Serializable 4.5.5 Handling Dependencies 4.5.6 Discussion 5 SYSTEM MODEL 5.1 Definition of the Term “Overload” 5.2 Basic Queuing Model 5.2.1 Option (1): Replacement with a Merger Thread 5.2.2 Option (2): Adding Merger Thread 5.2.3 Using Multiple Merger Threads 5.2.4 Evaluation 5.3 Extended Queue Model 5.3.1 Option (1): Replacement with a Merger Thread 5.3.2 Option (2): Adding Merger Thread 5.3.3 Evaluation 6 IMPLEMENTATION 6.1 Background: SAP HANA 6.2 System Design 6.2.1 Read Committed 6.2.2 Snapshot Isolation 6.3 Merger Component 6.3.1 Overview 6.3.2 Dequeuing 6.3.3 Merging 6.3.4 Sending 6.3.5 Updating MTx State 6.4 Challenges in the Implementation of Merging Writes 6.4.1 SQL String Implementation 6.4.2 Update Count 6.4.3 Error Propagation 6.4.4 Abort and Rollback 7 EVALUATION 7.1 Benchmark Settings 7.2 System Settings 7.2.1 Experiment I: End-to-end Response Time Within a SAP Hybris System 7.2.2 Experiment II: Dequeuing Strategy 7.2.3 Experiment III: Merging Improvement on Different Statement, Transaction and Workload Types 7.2.4 Experiment IV: End-to-End Latency in YCSB 7.2.5 Experiment V: Breakdown of Execution in YCSB 7.2.6 Discussion of System Settings 7.3 Merging in Interactive Transactions 7.3.1 Experiment VI: Merging TATP in Read Uncommitted 7.3.2 Experiment VII: Merging TATP in Read Committed 7.3.3 Experiment VIII: Merging TATP in Snapshot Isolation 7.4 Merging Queries in Stored Procedures Experiment IX: Merging TATP Stored Procedures in Read Committed 7.5 Merging SAP Hybris 7.5.1 Experiment X: CPU-time Breakdown on HANA Components 7.5.2 Experiment XI: Merging Media Query in SAP Hybris 7.5.3 Discussion of our Results in Comparison with Related Work 8 CONCLUSION 8.1 Summary 8.2 Future Research Directions REFERENCES A UML CLASS DIAGRAM

    Eventual Durability of ACID Transactions in Database Systems

    Get PDF
    Modern database systems that support ACID transactions, and applications built around these databases, may choose to sacrifice transaction durability for performance when they deem it necessary. While this approach may yield good performance, it has three major downsides. Firstly, users are often not provided information about when and if the issued transactions become durable. Secondly, users cannot know if durable and non-durable transactions see each other’s effects. Finally, this approach pushes durability handling outside the scope of the transactional model, making it difficult for applications to reason about correctness and data consistency. To address these issues, we present the idea of “Eventual Durability” (ED) to provide a principled way for applications to manage transaction durability trade-offs. The ED model extends the traditional transaction model by decoupling a transaction’s commit point from its durability point – therefore, allowing applications to control which transactions should be acknowledged at commit point and which ones at their durability point. Furthermore, we redefine serialisability and recoverability under ED to allow applications to ascertain if fast transactions became durable and how they might have interacted with safe ones. With ED, users and applications can know what to expect to lose when there is a failure – thus, bringing back managing durability inside the transaction model. We implement the ED model in PostgreSQL and evaluate it to understand the model’s effect on transaction latency, abort rates and throughput. We show that ED Postgres achieves significant latency improvements even while ensuring the guarantees provided by the model. Since a transaction’s resources are released earlier in ED Postgres, we expected to see lower abort rates and higher throughput. Consequently, we observed that ED Postgres provides an average of 91.25% – 93% reduction in abort rates under a contentious workload and an average of 75% increase in throughput compared to baseline Postgres. We also run the TPC-C benchmark against ED Postgres and discuss the findings. Lastly, we discuss how ED Postgres can be used in realistic settings to obtain latency benefits, throughput improvements, reduced abort rates, and fresher reads

    A Survey on Transactional Stream Processing

    Full text link
    Transactional stream processing (TSP) strives to create a cohesive model that merges the advantages of both transactional and stream-oriented guarantees. Over the past decade, numerous endeavors have contributed to the evolution of TSP solutions, uncovering similarities and distinctions among them. Despite these advances, a universally accepted standard approach for integrating transactional functionality with stream processing remains to be established. Existing TSP solutions predominantly concentrate on specific application characteristics and involve complex design trade-offs. This survey intends to introduce TSP and present our perspective on its future progression. Our primary goals are twofold: to provide insights into the diverse TSP requirements and methodologies, and to inspire the design and development of groundbreaking TSP systems

    Online Schema Evolution is (Almost) Free for Snapshot Databases

    Full text link
    Modern database applications often change their schemas to keep up with the changing requirements. However, support for online and transactional schema evolution remains challenging in existing database systems. Specifically, prior work often takes ad hoc approaches to schema evolution with 'patches' applied to existing systems, leading to many corner cases and often incomplete functionality. Applications therefore often have to carefully schedule downtimes for schema changes, sacrificing availability. This paper presents Tesseract, a new approach to online and transactional schema evolution without the aforementioned drawbacks. We design Tesseract based on a key observation: in widely used multi-versioned database systems, schema evolution can be modeled as data modification operations that change the entire table, i.e., data-definition-as-modification (DDaM). This allows us to support schema almost 'for free' by leveraging the concurrency control protocol. By simple tweaks to existing snapshot isolation protocols, on a 40-core server we show that under a variety of workloads, Tesseract is able to provide online, transactional schema evolution without service downtime, and retain high application performance when schema evolution is in progress.Comment: To appear at Proceedings of the 2023 International Conference on Very Large Data Bases (VLDB 2023

    Adaptive Data Storage and Placement in Distributed Database Systems

    Get PDF
    Distributed database systems are widely used to provide scalable storage, update and query facilities for application data. Distributed databases primarily use data replication and data partitioning to spread load across nodes or sites. The presence of hotspots in workloads, however, can result in imbalanced load on the distributed system resulting in performance degradation. Moreover, updates to partitioned and replicated data can require expensive distributed coordination to ensure that they are applied atomically and consistently. Additionally, data storage formats, such as row and columnar layouts, can significantly impact latencies of mixed transactional and analytical workloads. Consequently, how and where data is stored among the sites in a distributed database can significantly affect system performance, particularly if the workload is not known ahead of time. To address these concerns, this thesis proposes adaptive data placement and storage techniques for distributed database systems. This thesis demonstrates that the performance of distributed database systems can be improved by automatically adapting how and where data is stored by leveraging online workload information. A two-tiered architecture for adaptive distributed database systems is proposed that includes an adaptation advisor that decides at which site(s) and how transactions execute. The adaptation advisor makes these decisions based on submitted transactions. This design is used in three adaptive distributed database systems presented in this thesis: (i) DynaMast that efficiently transfers data mastership to guarantee single-site transactions while maintaining well-understood and established transactional semantics, (ii) MorphoSys that selectively and adaptively replicates, partitions and remasters data based on a learned cost model to improve transaction processing, and (iii) Proteus that uses learned workload models to predictively and adaptively change storage layouts to support both high transactional throughput and low latency analytical queries. Collectively, this thesis is a concrete step towards autonomous database systems that allow users to specify only the data to store and the queries to execute, leaving the system to judiciously choose the storage and execution mechanisms to deliver high performance

    Cloud-edge hybrid applications

    Get PDF
    Many modern applications are designed to provide interactions among users, including multi- user games, social networks and collaborative tools. Users expect application response time to be in the order of milliseconds, to foster interaction and interactivity. The design of these applications typically adopts a client-server model, where all interac- tions are mediated by a centralized component. This approach introduces availability and fault- tolerance issues, which can be mitigated by replicating the server component, and even relying on geo-replicated solutions in cloud computing infrastructures. Even in this case, the client-server communication model leads to unnecessary latency penalties for geographically close clients and high operational costs for the application provider. This dissertation proposes a cloud-edge hybrid model with secure and ecient propagation and consistency mechanisms. This model combines client-side replication and client-to-client propagation for providing low latency and minimizing the dependency on the server infras- tructure, fostering availability and fault tolerance. To realize this model, this works makes the following key contributions. First, the cloud-edge hybrid model is materialized by a system design where clients maintain replicas of the data and synchronize in a peer-to-peer fashion, and servers are used to assist clients’ operation. We study how to bring most of the application logic to the client-side, us- ing the centralized service primarily for durability, access control, discovery, and overcoming internetwork limitations. Second, we dene protocols for weakly consistent data replication, including a novel CRDT model (∆-CRDTs). We provide a study on partial replication, exploring the challenges and fundamental limitations in providing causal consistency, and the diculty in supporting client- side replicas due to their ephemeral nature. Third, we study how client misbehaviour can impact the guarantees of causal consistency. We propose new secure weak consistency models for insecure settings, and algorithms to enforce such consistency models. The experimental evaluation of our contributions have shown their specic benets and limitations compared with the state-of-the-art. In general, the cloud-edge hybrid model leads to faster application response times, lower client-to-client latency, higher system scalability as fewer clients need to connect to servers at the same time, the possibility to work oine or disconnected from the server, and reduced server bandwidth usage. In summary, we propose a hybrid of cloud-and-edge which provides lower user-to-user la- tency, availability under server disconnections, and improved server scalability – while being ecient, reliable, and secure.Muitas aplicações modernas são criadas para fornecer interações entre utilizadores, incluindo jogos multiutilizador, redes sociais e ferramentas colaborativas. Os utilizadores esperam que o tempo de resposta nas aplicações seja da ordem de milissegundos, promovendo a interação e interatividade. A arquitetura dessas aplicações normalmente adota um modelo cliente-servidor, onde todas as interações são mediadas por um componente centralizado. Essa abordagem apresenta problemas de disponibilidade e tolerância a falhas, que podem ser mitigadas com replicação no componente do servidor, até com a utilização de soluções replicadas geogracamente em infraestruturas de computação na nuvem. Mesmo neste caso, o modelo de comunicação cliente-servidor leva a penalidades de latência desnecessárias para clientes geogracamente próximos e altos custos operacionais para o provedor das aplicações. Esta dissertação propõe um modelo híbrido cloud-edge com mecanismos seguros e ecientes de propagação e consistência. Esse modelo combina replicação do lado do cliente e propagação de cliente para cliente para fornecer baixa latência e minimizar a dependência na infraestrutura do servidor, promovendo a disponibilidade e tolerância a falhas. Para realizar este modelo, este trabalho faz as seguintes contribuições principais. Primeiro, o modelo híbrido cloud-edge é materializado por uma arquitetura do sistema em que os clientes mantêm réplicas dos dados e sincronizam de maneira ponto a ponto e onde os servidores são usados para auxiliar na operação dos clientes. Estudamos como trazer a maior parte da lógica das aplicações para o lado do cliente, usando o serviço centralizado principalmente para durabilidade, controlo de acesso, descoberta e superação das limitações inter-rede. Em segundo lugar, denimos protocolos para replicação de dados fracamente consistentes, incluindo um novo modelo de CRDTs (∆-CRDTs). Fornecemos um estudo sobre replicação parcial, explorando os desaos e limitações fundamentais em fornecer consistência causal e a diculdade em suportar réplicas do lado do cliente devido à sua natureza efémera. Terceiro, estudamos como o mau comportamento da parte do cliente pode afetar as garantias da consistência causal. Propomos novos modelos seguros de consistência fraca para congurações inseguras e algoritmos para impor tais modelos de consistência. A avaliação experimental das nossas contribuições mostrou os benefícios e limitações em comparação com o estado da arte. Em geral, o modelo híbrido cloud-edge leva a tempos de resposta nas aplicações mais rápidos, a uma menor latência de cliente para cliente e à possibilidade de trabalhar oine ou desconectado do servidor. Adicionalmente, obtemos uma maior escalabilidade do sistema, visto que menos clientes precisam de estar conectados aos servidores ao mesmo tempo e devido à redução na utilização da largura de banda no servidor. Em resumo, propomos um modelo híbrido entre a orla (edge) e a nuvem (cloud) que fornece menor latência entre utilizadores, disponibilidade durante desconexões do servidor e uma melhor escalabilidade do servidor – ao mesmo tempo que é eciente, conável e seguro

    When Private Blockchain Meets Deterministic Database

    Full text link
    Private blockchain as a replicated transactional system shares many commonalities with distributed database. However, the intimacy between private blockchain and deterministic database has never been studied. In essence, private blockchain and deterministic database both ensure replica consistency by determinism. In this paper, we present a comprehensive analysis to uncover the connections between private blockchain and deterministic database. While private blockchains have started to pursue deterministic transaction executions recently, deterministic databases have already studied deterministic concurrency control protocols for almost a decade. This motivates us to propose Harmony, a novel deterministic concurrency control protocol designed for blockchain use. We use Harmony to build a new relational blockchain, namely HarmonyBC, which features low abort rates, hotspot resiliency, and inter-block parallelism, all of which are especially important to disk-oriented blockchain. Empirical results on Smallbank, YCSB, and TPC-C show that HarmonyBC offers 2.0x to 3.5x throughput better than the state-of-the-art private blockchains
    corecore