2 research outputs found

    Exploiting Data Skew for Improved Query Performance

    Full text link
    Analytic queries enable sophisticated large-scale data analysis within many commercial, scientific and medical domains today. Data skew is a ubiquitous feature of these real-world domains. In a retail database, some products are typically much more popular than others. In a text database, word frequencies follow a Zipf distribution with a small number of very common words, and a long tail of infrequent words. In a geographic database, some regions have much higher populations (and data measurements) than others. Current systems do not make the most of caches for exploiting skew. In particular, a whole cache line may remain cache resident even though only a small part of the cache line corresponds to a popular data item. In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines. The net result is better spatial locality, and better utilization of limited cache resources. We develop a theoretical model for analyzing the cache behavior, and implement database operators that are efficient in the presence of skew. Our experiments on real and synthetic data show that exploiting skew can significantly improve in-memory query performance. In some cases, our techniques can speed up queries by over an order of magnitude

    Programming Patterns for Architecture-Level Software Optimizations on Frequent Pattern Mining

    No full text
    One very important application in the data mining domain is frequent pattern mining. Various authors have worked on improving the efficiency of this computation, mostly focusing on algorithm-level improvement. More recent work has explored architecture specific optimizations of this computation. Our goal in this paper is to provide a systematic approach to architecture-level software optimizations by identifying applicable tuning patterns. We show the generality and effectiveness of these patterns by tuning several frequent pattern mining algorithms and showing significant performance improvements. 1. Introduction an
    corecore