14 research outputs found
Efficient Computation of Group Skyline Queries on MapReduce
Skyline query is one of the important issues indatabase research and has been applied in diverse applicationsincluding multi-criteria decision support systems and so on. Theresponse of a skyline query eliminates unnecessary tuples andreturns only the user-interested result. Traditional skyline querypicks out the outstanding tuples, based on one-to-one recordcomparisons. Some modern applications request, beyond thesingular ones, for superior combinations of records. For example,fantasy basketball is composed of 5 players, fantasy baseball of 9players, and a hackathon of several programmers. Group skylineaims at considering all the groups comprising several records,and finding out the non-dominated ones. Because of the highcomplexity, few studies have been conducted and none has beenpresented in either distributed or parallel computing. This paperis the first study that solves the group skyline in the distributedMapReduce framework. We propose the MRGS algorithm togenerate all the combinations, compute the winners at each localnode, and find out the answer globally. We further propose theMRIGS algorithm to release the bottleneck of MRGS onunbalanced computing load of nodes. Finally, we propose theMRIGS-P algorithm to prune the impossible combinations andproduce indexed and balanced MapReduce computation.Extensive experiments with NBA datasets show that MRIGS-P is6 times faster than the MRGS algorithm
A Survey of Techniques for Answering Top-K Queries
Top-k queries are useful in retrieving top-k records from a given set of records depending on the value of a function F on their attributes. Many techniques have been proposed in database literature for answering top-k queries. These are mainly categorized into three: Sorted-list based, layer based and View based. In first category, records are sorted along each dimension and then assigned a rank to each of the records using parallel scanning method. Threshold Algorithm (TA) and Fagin2019;s Algorithm (FA) are the examples of sorted-list based category. Second category is layer based category, in which all the records are organized into layers such as in onion technique and robust indexing technique. Third category includes methods such as PREFER and LPTA (Linear Programming Adaptation of Threshold Algorithm) and processing is based on the materialized views
Effective Space Usage Estimation for Sliding-Window Skybands
Skyline query computes all the “best” elements which are
not dominated by any other elements and thus is very important
for decision-making applications. Recently, it is generalized
to skyband query and a k-skyband query returns
those elements dominated by no more than k, of other elements.
To incorporate the skyband operator into the stream engine
for monitoring skybands over sliding windows, space usage
estimation for skyband operator becomes a critical issue in
the query optimizer. In this paper, we firstly introduce the
skyband sketch as the cost model. Based on the cost model,
we propose an approach for estimating the space usage of
skyband operator over sliding windows of data streams under
the assumptions of statistical independence across dimensions,
no duplicate values over each dimension, and dimension
domains totally ordered. Experiments verify that
our approaches can estimate the space usage effectively over
arbitrarily distributed data. To the best of our knowledge,
this is the first work that attempts to address the issue and
proposes effective approaches to solve it