Search CORE

60 research outputs found

Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

Author: Chakrabarti K.
Daniel Lemire
Geffner S.
Gray J.
Lemire D.
Li B.-C.
Moerkotte G.
Owen Kaser
Schmidt R. R.
Scott D.
Vitter J. S.
Zhou F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2008
Field of study

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

arXiv.org e-Print Archive

R-libre

Crossref

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

Optimizing join enumeration in transformation-based query optimizers

Author: Chaudhuri S.
DeHaan D.
Fender P.
Fender P.
Fender P.
Galindo-Legaria C. A.
Galindo-Legaria C. A.
Graefe G.
Graefe G.
Moerkotte G.
Moerkotte G.
Ono K.
Pellenkoft A.
Roy P.
Roy P.
Shapiro L. D.
Yan W. P.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

The importance of sibling clustering for efficient bulkload of XML document trees

Author: C. C. Kanne
G. Moerkotte
Kanne Moerkotte In
Publication venue
Publication date
Field of study

this paper, we discuss the requirements for an XDS bulkload component and examine existing algorithms for tree partitioning and their applicability to the bulkload operation. We derive a new tree-partitioning algorithm for use in the bulkload operation and present the design of the bulkload component for the XDS Natix. Finally, we evaluate the performance of the bulkload component and compare our results with previous work

CiteSeerX

Errata for "Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products"

Author: Moerkotte G.
Ono K.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Two birds, one stone

Author: Moerkotte G.
Sakurai Y.
Weiss R.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Function materialization in object bases: design, realization, and evaluation

Author: A. Kemper
C. Kilger
G. Moerkotte
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Unnesting SQL queries in the presence of disjunction

Author: G. Moerkotte
M. Brantner
N. May
Publication venue
Publication date
Field of study

Optimizing nested queries is an intricate problem. It becomes even harder if in a nested query the linking predicate or the correlation predicate occurs disjunctively. We present the first unnesting strategy that can effectively deal with such queries. The starting point of our approach is to translate SQL into the relational algebra extended by bypass operators. Then we present for the first time unnesting equivalences which are valid for algebraic expressions containing bypass operators. Applying these to the translated queries results in our effective unnesting strategy for nested SQL queries with disjunction. With an extensive experimental study (including three commercial DBMSs), we demonstrate the possible performance gains of our approach.

CiteSeerX

Optimizing Disjunctive Queries with Expensive Predicates

Author: A. Kemper
G. Moerkotte
K. Peithner
M. Steinbrunn
Publication venue
Publication date: 01/01/1994
Field of study

In this work, we propose and assess a technique called bypass processing for optimizing the evaluation of disjunctive queries with expensive predicates. The technique is particularly useful for optimizing selection predicates that contain terms whose evaluation costs vary tremendously; e.g., the evaluation of a nested subquery or the invocation of a user-defined function in an object-oriented or extended relational model may be orders of magnitude more expensive than an attribute access (and comparison). The idea of bypass processing consists of avoiding the evaluation of such expensive terms whenever the outcome of the entire selection predicate can already be induced by testing other, less expensive terms. In order to validate the viability of bypass evaluation, we extend a previously developed optimizer architecture and incorporate three alternative optimization algorithms for generating bypass processing plans. 1 Introduction During the past few years we have witnessed tremendous..

CiteSeerX