20 research outputs found

    KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures

    Get PDF
    Growing main memory capacities and an increasing number of hardware threads in modern server systems led to fundamental changes in database architectures. Most importantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite of a high main memory scan performance, index structures are still important components, but they have to be designed from scratch to cope with the specific characteristics of main memory and to exploit the high degree of parallelism. Current research mainly focused on adapting block-optimized B+-Trees, but these data structures were designed for secondary memory and involve comprehensive structural maintenance for updates. In this paper, we present the KISS-Tree, a latch-free inmemory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. More specifically, we aim for the same performance as modern hash-based algorithms but keeping the order-preserving nature of trees. We achieve this by using a prefix tree that incorporates virtual memory management functionality and compression schemes. In our experiments, we evaluate the KISS-Tree on different workloads and hardware platforms and compare the results to existing in-memory indexes. The KISS-Tree offers the highest reported read performance on current architectures, a balanced read/write performance, and has a low memory footprint

    Cache-Aware Lock-Free Concurrent Hash Tries

    Get PDF
    This report describes an implementation of a non-blocking concurrent shared-memory hash trie based on single-word compare-and-swap instructions. Insert, lookup and remove operations modifying different parts of the hash trie can be run independent of each other and do not contend. Remove operations ensure that the unneeded memory is freed and that the trie is kept compact. A pseudocode for these operations is presented and a proof of correctness is given -- we show that the implementation is linearizable and lock-free. Finally, benchmarks are presented which compare concurrent hash trie operations against the corresponding operations on other concurrent data structures, showing their performance and scalability

    Achieving Efficient Work-Stealing for Data-Parallel Collections

    Get PDF
    In modern programming high-level data-structures are an important foundation for most applications. With the rise of the multi-core era, there is a growing trend of supporting data-parallel collection operations in general purpose programming languages and platforms. To facilitate object-oriented reuse these operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when used in problems with irregular workloads. Work-stealing is a proven load-balancing technique when it comes to irregular workloads, but general purpose work-stealing also suffers from abstraction penalties. In this paper we present a generic design of a data-parallel collections framework based on work-stealing for shared-memory architectures. We show how abstraction penalties can be overcome through callsite specialization of data-parallel operations instances. Moreover, we show how to make work-stealing fine-grained and efficient when specialized for particular data-structures. We experimentally validate the performance of different data-structures and data-parallel operations, achieving up to 60X better performance with abstraction penalties eliminated and 3X higher speedups by specializing work-stealing compared to existing approaches

    Scaling In-Memory databases on multicores

    Get PDF
    Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead. In this dissertation, we present a scalability study of two general purpose IMDBs on multicore systems. The results show that current general purpose IMDBs do not scale on multicores, due to contention among threads running concurrent transactions. In this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in multicores, while enforcing strong isolation semantics. First, we present a solution that requires no modification to either database systems or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine. Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads. Next we addressed the scalability limitations for update-intensive workloads, and propose the reduction of locking granularity from the table level to the attribute level. This further improved performance for intensive and moderate update workloads, at a slight cost for read-only workloads. Scalability is limited to intensive-read and read-only workloads. Finally, we investigate the impact applications have on the performance of database systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under which transaction perform all read operations before executing write operations. The RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation

    A Framework for Verifying Scalability and Performance of Cloud Based Web Applications

    Get PDF
    Antud magistritöö uurib võimalusi, kuidas kasutada veebirakendust MediaWiki, mida kasutatakse Wikipedia rakendamiseks, ja kuidas kasutada antud teenust mitme serveri peal nii, et see oleks kõige optimaalsem ja samas kõik veebikülastajad saaks teenusele ligi mõistliku ajaga. Amazon küsib raha pilves toimivate masinate ajalise kasutamise eest, ümardades pooleldi kasutatud tunnid täistundideks. Antud töö sisaldab vahendeid kuidas mõõta pilves olevate serverite jõudlust ning võimekust ja skaleerida antud veebirakendust. Amazon EC2 pilvesüsteemis on võimalik kasutajatel koostada virtuaalseid tõmmiseid operatsiooni süsteemidest, mida saab pilves rakendada XEN virtualiseerimise keskkonnas, kui eraldiseisvat serverit. Antud virtuaalse tõmmise peale sai paigaldatud tööks vaja minev keskkond, et koguda andmeid serverite kasutuse kohta ja võimaldada platvormi, mis lubab dünaamiliselt ajas lisada servereid ja eemaldada neid. Magistritöö uurib Amazon EC2 pilvesüsteemi kasutusvõimalusi, mille hulka kuulub Auto Scale, mis aitab skaleerida pilves kasutatavaid rakendusi horisontaalselt. Amazon pilve kasutatakse antud töös MediaWiki seadistamiseks ja suuremahuliste eksperimentide rakendamiseks. Vajalik on teha palju optimiseerimisi ja seadistamisi, et suurendada teenuse läbilaske võimsust. Antud töö raames loodud raamistik aitab mõõta serverite kasutust, kogudes andmeid protsessori, mälu ja võrgu kasutamise kohta. See aitab leida süsteemis olevaid kitsaskohti, mis võivad põhjustada süsteemi olulist aeglustumist. Antud töö raames sai tehtud erinevaid teste, et selgitada välja parim võimalik paigutus ja seadistus. Saavutatud seadistust kontrolliti hiljem 2 suuremahulise eksperimentiga, mis kestis üks päev ja mille käigus tekitati 22 miljonit päringut, leidmaks kuidas raamistik võimaldab teenust pilves skaleerida ülesse päringute arvu tõusmisel ja vähendada servereid, kui päringute arv väheneb. Ühes eksperimendis kasutati optimaalset heuristikat, et selgitada välja optimaalne serverite arv, mida on vaja pilves rakendada. Teine eksperimentidest kasutas Amazon Auto Scale teenust, mis kasutas serverite keskmist protsessori kasutamist, et selgitada välja, kas pilves on vaja servereid lisada või eemaldada. Antud eksperimendid näitavad selgelt, et kasutades dünaamilist arvu servereid, olenevalt päringute arvust, on võimalik teenuse üleval hoidmiseks säästa raha.Network usage and bandwidth speeds have increased massively and vast majority of people are using Internet on daily bases. This has increased CPU utilization on servers meaning that sites with large visits are using hundreds of computers to accommodate increasing traffic rates to the services. Making plans for hardware ordering to upgrade old servers or to add new servers is not a straightforward process and has to be carefully considered. There is a need to predict traffic rate for future usage. Buying too many servers can mean revenue loss and buying too few servers can result in losing clients. To overcome this problem, it is wise to consider moving services into virtual cloud and make server provisioning as an automatic step. One of the popular cloud service providers, Amazon is giving possibility to use large amounts of computing power for running servers in virtual environment with single click. They are providing services to provision as many servers as needed to run, depending how loaded the servers are and whatever we need to do, to add new servers or to remove existing ones. This will eliminate problems associated with ordering new hardware. Adding new servers is an automatic process and will follow the demand, like adding more servers for peak hours and removing unnecessary servers at night or when the traffic is low. Customer pays only for the used resources on the cloud. This thesis focuses on setting up a testbed for the cloud that will run web application, which will be scaled horizontally (by replicating already running servers) and will use the benchmark tool for stressing out the web application, by simulating huge number of concurrent requests and proper load-balancing mechanisms. This study gives us a proper picture how servers in the cloud are scaled and whole process remains transparent for the end user, as it sees the web application as one server. In conclusion, the framework is helpful in analyzing the performance of cloud based applications, in several of our research activities
    corecore