755 research outputs found

    Automatic contention detection and amelioration for data-intensive operations

    Full text link

    Cache craftiness for fast multicore key-value storage

    Get PDF
    We present Masstree, a fast key-value database designed for SMP machines. Masstree keeps all data in memory. Its main data structure is a trie-like concatenation of B+-trees, each of which handles a fixed-length slice of a variable-length key. This structure effectively handles arbitrary-length possiblybinary keys, including keys with long shared prefixes. [superscript +]-tree fanout was chosen to minimize total DRAM delay when descending the tree and prefetching each tree node. Lookups use optimistic concurrency control, a read-copy-update-like technique, and do not write shared data structures; updates lock only affected nodes. Logging and checkpointing provide consistency and durability. Though some of these ideas appear elsewhere, Masstree is the first to combine them. We discuss design variants and their consequences. On a 16-core machine, with logging enabled and queries arriving over a network, Masstree executes more than six million simple queries per second. This performance is comparable to that of memcached, a non-persistent hash table server, and higher (often much higher) than that of VoltDB, MongoDB, and Redis.National Science Foundation (U.S.). (Award 0834415)National Science Foundation (U.S.). (Award 0915164)Quanta Computer (Firm

    Exploiting Data Skew for Improved Query Performance

    Full text link
    Analytic queries enable sophisticated large-scale data analysis within many commercial, scientific and medical domains today. Data skew is a ubiquitous feature of these real-world domains. In a retail database, some products are typically much more popular than others. In a text database, word frequencies follow a Zipf distribution with a small number of very common words, and a long tail of infrequent words. In a geographic database, some regions have much higher populations (and data measurements) than others. Current systems do not make the most of caches for exploiting skew. In particular, a whole cache line may remain cache resident even though only a small part of the cache line corresponds to a popular data item. In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines. The net result is better spatial locality, and better utilization of limited cache resources. We develop a theoretical model for analyzing the cache behavior, and implement database operators that are efficient in the presence of skew. Our experiments on real and synthetic data show that exploiting skew can significantly improve in-memory query performance. In some cases, our techniques can speed up queries by over an order of magnitude

    Scaling In-Memory databases on multicores

    Get PDF
    Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↔ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead. In this dissertation, we present a scalability study of two general purpose IMDBs on multicore systems. The results show that current general purpose IMDBs do not scale on multicores, due to contention among threads running concurrent transactions. In this work, we explore di↔erent direction to overcome the scalability issues of IMDBs in multicores, while enforcing strong isolation semantics. First, we present a solution that requires no modification to either database systems or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↔er scalable performance under read-only workloads, while updateintensive workloads su↔er from performance loss, when compared to the standalone engine. Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↔ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads. Next we addressed the scalability limitations for update-intensive workloads, and propose the reduction of locking granularity from the table level to the attribute level. This further improved performance for intensive and moderate update workloads, at a slight cost for read-only workloads. Scalability is limited to intensive-read and read-only workloads. Finally, we investigate the impact applications have on the performance of database systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under which transaction perform all read operations before executing write operations. The RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation

    Aeronautical Engineering: A continuing bibliography with indexes, supplement 99

    Get PDF
    This bibliography lists 292 reports, articles, and other documents introduced into the NASA scientific and technical information system in July 1978

    Elusive catch: domestic challenges encountered by the Philippines in Ratifying the Cape Town Agreement of 2012

    Get PDF

    Implicit Gender Bias in the Legal Profession: An Empirical Study

    Get PDF

    Parallelization Strategies for Modern Computing Platforms: Application to Illustrative Image Processing and Computer Vision Applications

    Get PDF
    RÉSUMÉ L’évolution spectaculaire des technologies dans le domaine du matĂ©riel et du logiciel a permis l’émergence des nouvelles plateformes parallĂšles trĂšs performantes. Ces plateformes ont marquĂ© le dĂ©but d’une nouvelle Ăšre de la computation et il est prĂ©conisĂ© qu’elles vont rester dans le domaine pour une bonne pĂ©riode de temps. Elles sont prĂ©sentes dĂ©jĂ  dans le domaine du calcul de haute performance (en anglais HPC, High Performance Computer) ainsi que dans le domaine des systĂšmes embarquĂ©s. RĂ©cemment, dans ces domaines le concept de calcul hĂ©tĂ©rogĂšne a Ă©tĂ© adoptĂ© pour atteindre des performances Ă©levĂ©es. Ainsi, plusieurs types de processeurs sont utilisĂ©s, dont les plus populaires sont les unitĂ©s centrales de traitement ou CPU (de l’anglais Central Processing Unit) et les processeurs graphiques ou GPU (de l’anglais Graphics Processing Units). La programmation efficace pour ces nouvelles plateformes parallĂšles amĂšne actuellement non seulement des opportunitĂ©s mais aussi des dĂ©fis importants pour les concepteurs. Par consĂ©quent, l’industrie a besoin de l’appui de la communautĂ© de recherche pour assurer le succĂšs de ce nouveau changement de paradigme vers le calcul parallĂšle. Trois dĂ©fis principaux prĂ©sents pour les processeurs GPU massivement parallĂšles (ou “many-cores”) ainsi que pour les processeurs CPU multi-coeurs sont: (1) la sĂ©lection de la meilleure plateforme parallĂšle pour une application donnĂ©e, (2) la sĂ©lection de la meilleure stratĂ©gie de parallĂšlisation et (3) le rĂ©glage minutieux des performances (ou en anglais performance tuning) pour mieux exploiter les plateformes existantes. Dans ce contexte, l’objectif global de notre projet de recherche est de dĂ©finir de nouvelles solutions pour aider Ă  la programmation efficace des applications complexes sur les plateformes parallĂšles modernes. Les principales contributions Ă  la recherche sont: 1. L’évaluation de l’efficacitĂ© d’accĂ©lĂ©ration pour plusieurs plateformes parallĂšles, dans le cas des applications de calcul intensif. 2. Une analyse quantitative des stratĂ©gies de parallĂšlisation et implantation sur les plateformes Ă  base de processeurs CPU multi-cƓur ainsi que pour les plateformes Ă  base de processeurs GPU massivement parallĂšles. 3. La dĂ©finition et la mise en place d’une approche de rĂ©glage de performances (en Anglais performance tuning) pour les plateformes parallĂšles. Les contributions proposĂ©es ont Ă©tĂ© validĂ©es en utilisant des applications rĂ©elles illustratives et un ensemble variĂ© de plateformes parallĂšles modernes.----------ABSTRACT With the technology improvement for both hardware and software, parallel platforms started a new computing era and they are here to stay. Parallel platforms may be found in High Performance Computers (HPC) or embedded computers. Recently, both HPC and embedded computers are moving toward heterogeneous computing platforms. They are employing both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) to achieve the highest performance. Programming efficiently for parallel platforms brings new opportunities but also several challenges. Therefore, industry needs help from the research community to succeed in its recent dramatic shift to parallel computing. Parallel programing presents several major challenges. These challenges are equally present whether one programs on a many-core GPU or on a multi-core CPU. Three of the main challenges are: (1) Finding the best platform providing the required acceleration (2) Select the best parallelization strategy (3) Performance tuning to efficiently leverage the parallel platforms. In this context, the overall objective of our research is to propose a new solution helping designers to efficiently program complex applications on modern parallel architectures. The contributions of this thesis are: 1. The evaluation of the efficiency of several target parallel platforms to speedup compute-intensive applications. 2. The quantitative analysis for parallelization and implementation strategies on multicore CPUs and many-core GPUs. 3. The definition and implementation of a new performance tuning framework for heterogeneous parallel platforms. The contributions were validated using real computation intensive applications and modern parallel platform based on multi-core CPU and many-core GPU
    • 

    corecore