52 research outputs found

    Approximate answering of aggregate queries in relational databases

    No full text
    Ph.D.Edward Omiecinsk

    Robust Estimation With Sampling and Approximate Pre-Aggregation

    No full text
    The majority of data reduction techniques for approximate query processing (such as wavelets, histograms, kernels, and so on) are not usually applicable to categorical data. There has been something of a disconnect between research in this area and the reality of database data; much recent research has focused on approximate query processing over ordered or numerical attributes, but arguably the majority of database attributes are categorical: country, state, job_title, color, sex, department, and so on. This paper considers the problem of approximation of aggregate functions over categorical data, or mixed categorical/numerical data. We propose a method based upon random sampling, called Approximate Pre-Aggregation (APA). The biggest drawback of sampling for aggregate function estimating is the sensitivity of sampling to attribute value skew, and APA uses several techniques to overcome this sensitivity. The increase in accuracy using APA compared to “plain vanilla ” sampling is dramatic. For SUM and AVG queries, the relative error for random sampling alone is more than 700 % greater than for sampling with APA. Even if stratified sampling techniques are used, the error is still between 28 % and 175 % greater than for APA.

    Closest-point-of-approach join for moving object histories

    No full text
    In applications that produce a large amount of data describing the paths of moving objects, there is a need to ask questions about the interaction of objects over a long recorded history. In this paper, we consider the problem of computing joins over massive moving object histories. The particular join that we study is the “Closest-Point-Of-Approach ” join, which asks: Given a massive moving object history, which objects approached within a distance ‘d’ of one another? We carefully consider several relatively obvious strategies for computing the answer to such a join, and then propose a novel, adaptive join algorithm which naturally alters the way in which it computes the join in response to the characteristics of the underlying data. 1
    corecore