8 research outputs found

    High-Performance In-Memory OLTP via Coroutine-to-Transaction

    Get PDF
    Data stalls are a major overhead in main-memory database engines due to the use of pointer-rich data structures. Lightweight coroutines ease the implementation of software prefetching to hide data stalls by overlapping computation and asynchronous data prefetching. Prior solutions, however, mainly focused on (1) individual components and operations and (2) intra-transaction batching that requires interface changes, breaking backward compatibility. It was not clear how they apply to a full database engine and how much end-to-end benefit they bring under various workloads. This thesis presents CoroBase, a main-memory database engine that tackles these challenges with a new coroutine-to-transaction paradigm. Coroutine-to-transaction models transactions as coroutines and thus enables inter-transaction batching, avoiding application changes but retaining the benefits of prefetching. We show that on a 48-core server, CoroBase can perform close to 2× better for read-intensive workloads and remain competitive for workloads that inherently do not benefit from software prefetching

    Bridging the Latency Gap between NVM and DRAM for Latency-bound Operations

    Get PDF
    Non-Volatile Memory (NVM) technologies exhibit 4× the read access latency of conventional DRAM. When the working set does not fit in the processor cache, this latency gap between DRAM and NVM leads to more than 2× runtime increase for queries dominated by latency-bound operations such as index joins and tuple reconstruction. We explain how to easily hide NVM latency by interleaving the execution of parallel work in index joins and tuple reconstruction using coroutines. Our evaluation shows that interleaving applied to the non-trivial implementations of these two operations in a production-grade codebase accelerates end-to-end query runtimes on both NVM and DRAM by up to 1.7× and 2.6× respectively, thereby reducing the performance difference between DRAM and NVM by more than 60%

    Rinnakkaisen C++-ohjelmoinnin modernit työkalut

    Get PDF
    Moniydinprosessoreiden yleistyessä tarvitaan tukea ohjelmointikieliltä, jotta niistä saadaan kaikki hyöty irti. Rinnakkainen ohjelmointi on perinteisesti toteutettu käyttöjärjestelmän järjestelmäkutsuilla, mutta kehitykset ohjelmointikielissä ovat tarjonneet standardoituja keinoja kirjoittaa rinnakkain suoritettavia ohjelmia käyttöjärjestelmästä riippumatta. Näihin ohjelmointikieliin lukeutuu C++. Se on tarjonnut rinnakkaisen ohjelmoinnin tukikirjaston standardiversiosta 11 alkaen. Standardi on kuitenkin jatkuvassa kehityksessä ja se on kokenut kolme päivitystä version 11 jälkeen. Tämän työn tavoitteena on selvittää, kuinka C++ standardi on kehittynyt rinnakkaisen ohjelmoinnin näkökulmasta. Työ toteutettiin kirjallisuuskatsauksena. Työn alussa esitellään, kuinka rinnakkaisen ohjelmoinnin tukikirjasto on kehittynyt. Erityisenä tarkastelun kohteena ovat lisäykset ja muutokset sen alikirjastoissa. Mielenkiintoisimpana lisäyksenä havaitaan corutiinit, jotka ovat funktioita, joiden suoritus voidaan keskeyttää ja tarvittaessa käynnistää uudelleen myöhemmin. Työn loppupuolella selvitetään, mitä eroa on corutiineilla ja säikeillä ohjelman suorituksen ja kirjoittamisen kannalta. Tähän liittyen esitellään tutkimuksia, jotka vertailevat ohjelmien suoritusnopeutta ja itse ohjelmoinnin tehokkuutta. Vertailevista tutkimuksista havaitaan, että corutiinit tarjoavat jopa kymmenkertaista suoritustehon kasvua, kun verrataan kontekstinvaihdon nopeutta. Corutiineja on testattu myös mahdollisuutena korvata lukot tietokantapalvelimilla, jolloin havaitaan noin neljänkertainen suoritustehon kasvu. Lopuksi corutiineja verrattiin tiedon esihaussa eri tietorakenteilla ja niiden käytön suurin tehohyöty oli noin yhdeksän prosenttia. Ohjelmakoodin kirjoittamisen kannalta corutiinit tarjoavat intuitiivisen keinon suorittaa synkronointia. Se tarjoaa varattuja avainsanoja, joita käyttäen voidaan kyseisen funktion suoritus väliaikaisesti keskeyttää. Ohjelmoijan tehtäväksi jää määrittää piste, jossa ohjelman suoritus palautuu kyseiseen funktioon. Tämä eroaa säikeistä siten, että säikeet avaavat uuden suorituksen ja pääsäie voi tarvittaessa pysähtyä odottamaan suorituksen valmistumista. Corutiinit eivät siis tarvitse odottelua, koska ohjelman suoritus siirtyy eri funktion vastuulle

    High-performance software packet processing

    Full text link
    In today’s Internet, it is highly desirable to have fast and scalable software packet processing solutions for network applications that run on commodity hardware. The advent of cloud computing drives the continued rapid growth of Internet traffic. Moreover, the development of emerging networking techniques, such as Network Function Virtualization, significantly shapes the need for implementing the network functions in software. Finally, with the advancement of modern platforms as well as software frameworks for packet processing, network applications have potential to process 100+ Gbps network traffic on a single commodity server. Representative frameworks include the Click modular router, the RouteBricks scalable routing architecture, and BUFFALO, the software-based Ethernet switch. Beneath this general-purpose routing and switching functionality lie a broad set of network applications, many of which are handled with custom methods to provide cost-effectiveness and flexibility. This thesis considers two long-standing networking applications, IP lookup and distributed denial-of-service (DDoS) mitigation, and proposes efficient software-based methods drawing from this new perspective. In this thesis, we first introduce several optimization techniques to accelerate network applications by taking advantage of modern CPU features. Then, we explore the IP lookup problem to find the longest matching prefix of an IP address in a set of prefixes. An ideal IP lookup algorithm should achieve small constant IP lookup time, and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. We propose SAIL, a splitting approach to IP lookup, and a suite of algorithms for IP lookup based on SAIL framework. We conducted extensive experiments to evaluate our algorithms, and experimental results show that our SAIL algorithms are much faster than well-known IP lookup algorithms. Next, we switch our focus to DDoS, an attempt to disrupt the legitimate traffic of a victim by sending a flood of Internet traffic from different sources. Our solution is Gatekeeper, the first open-source and deployable DDoS mitigation system. We present a series of optimization techniques, including use of modern platforms, group prefetching, coroutines, and hashing, to accelerate Gatekeeper. Experimental results show that these optimization techniques significantly improve its performance over alternative baseline solutions.2022-01-30T00:00:00

    Micro-architectural analysis of in-memory OLTP: Revisited

    Get PDF
    Micro-architectural behavior of traditional disk-based online transaction processing (OLTP) systems has been investigated extensively over the past couple of decades. Results show that traditional OLTP systems mostly under-utilize the available micro-architectural resources. In-memory OLTP systems, on the other hand, process all the data in main-memory and, therefore, can omit the buffer pool. Furthermore, they usually adopt more lightweight concurrency control mechanisms, cache-conscious data structures, and cleaner codebases since they are usually designed from scratch. Hence, we expect significant differences in micro-architectural behavior when running OLTP on platforms optimized for in-memory processing as opposed to disk-based database systems. In particular, we expect that in-memory systems exploit micro-architectural features such as instruction and data caches significantly better than disk-based systems. This paper sheds light on the micro-architectural behavior of in-memory database systems by analyzing and contrasting it to the behavior of disk-based systems when running OLTP workloads. The results show that, despite all the design changes, in-memory OLTP exhibits very similar micro-architectural behavior to disk-based OLTP: more than half of the execution time goes to memory stalls where instruction cache misses or the long-latency data misses from the last-level cache (LLC) are the dominant factors in the overall execution time. Even though ground-up designed in-memory systems can eliminate the instruction cache misses, the reduction in instruction stalls amplifies the impact of LLC data misses. As a result, only 30% of the CPU cycles are used to retire instructions, and 70% of the CPU cycles are wasted to stalls for both traditional disk-based and new generation in-memory OLTP
    corecore