60 research outputs found

    Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

    Get PDF
    Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

    The Case for Learned Index Structures

    Full text link
    Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

    The importance of sibling clustering for efficient bulkload of XML document trees

    No full text
    this paper, we discuss the requirements for an XDS bulkload component and examine existing algorithms for tree partitioning and their applicability to the bulkload operation. We derive a new tree-partitioning algorithm for use in the bulkload operation and present the design of the bulkload component for the XDS Natix. Finally, we evaluate the performance of the bulkload component and compare our results with previous work

    Two birds, one stone

    No full text

    Function materialization in object bases: design, realization, and evaluation

    No full text

    Unnesting SQL queries in the presence of disjunction

    No full text
    Optimizing nested queries is an intricate problem. It becomes even harder if in a nested query the linking predicate or the correlation predicate occurs disjunctively. We present the first unnesting strategy that can effectively deal with such queries. The starting point of our approach is to translate SQL into the relational algebra extended by bypass operators. Then we present for the first time unnesting equivalences which are valid for algebraic expressions containing bypass operators. Applying these to the translated queries results in our effective unnesting strategy for nested SQL queries with disjunction. With an extensive experimental study (including three commercial DBMSs), we demonstrate the possible performance gains of our approach.

    Optimizing Disjunctive Queries with Expensive Predicates

    No full text
    In this work, we propose and assess a technique called bypass processing for optimizing the evaluation of disjunctive queries with expensive predicates. The technique is particularly useful for optimizing selection predicates that contain terms whose evaluation costs vary tremendously; e.g., the evaluation of a nested subquery or the invocation of a user-defined function in an object-oriented or extended relational model may be orders of magnitude more expensive than an attribute access (and comparison). The idea of bypass processing consists of avoiding the evaluation of such expensive terms whenever the outcome of the entire selection predicate can already be induced by testing other, less expensive terms. In order to validate the viability of bypass evaluation, we extend a previously developed optimizer architecture and incorporate three alternative optimization algorithms for generating bypass processing plans. 1 Introduction During the past few years we have witnessed tremendous..
    corecore