266 research outputs found

    A systematic literature review of skyline query processing over data stream

    Get PDF
    Recently, skyline query processing over data stream has gained a lot of attention especially from the database community owing to its own unique challenges. Skyline queries aims at pruning a search space of a potential large multi-dimensional set of objects by keeping only those objects that are not worse than any other. Although an abundance of skyline query processing techniques have been proposed, there is a lack of a Systematic Literature Review (SLR) on current research works pertinent to skyline query processing over data stream. In regard to this, this paper provides a comparative study on the state-of-the-art approaches over the period between 2000 and 2022 with the main aim to help readers understand the key issues which are essential to consider in relation to processing skyline queries over streaming data. Seven digital databases were reviewed in accordance with the Preferred Reporting Items for Systematic Reviews (PRISMA) procedures. After applying both the inclusion and exclusion criteria, 23 primary papers were further examined. The results show that the identified skyline approaches are driven by the need to expedite the skyline query processing mainly due to the fact that data streams are time varying (time sensitive), continuous, real time, volatile, and unrepeatable. Although, these skyline approaches are tailored made for data stream with a common aim, their solutions vary to suit with the various aspects being considered, which include the type of skyline query, type of streaming data, type of sliding window, query processing technique, indexing technique as well as the data stream environment employed. In this paper, a comprehensive taxonomy is developed along with the key aspects of each reported approach, while several open issues and challenges related to the topic being reviewed are highlighted as recommendation for future research direction

    K-Dominance in Multidimensional Data: Theory and Applications

    Get PDF
    We study the problem of k-dominance in a set of d-dimensional vectors, prove bounds on the number of maxima (skyline vectors), under both worst-case and average-case models, perform experimental evaluation using synthetic and real-world data, and explore an application of k-dominant skyline for extracting a small set of top-ranked vectors in high dimensions where the full skylines can be unmanageably large

    CONTINUOUS MULTIQUERIES K-DOMINANT SKYLINE ON ROAD NETWORK

    Get PDF
    The increasing use of mobile devices makes spatial data worthy of consideration. To get maximum results, users often look for the best from a collection of objects. Among the algorithms that can be used is the skyline query. The algorithm looks for all objects that are not dominated by other objects in all of its attributes. However, data that has many attributes makes the query output a lot of objects so it is less useful for the user. k-dominant skyline queries can be a solution to reduce the output. Among the challenges is the use of skyline queries with spatial data and the many user preferences in finding the best object. This study proposes IKSR: the k-dominant skyline query algorithm that works in a road network environment and can process many queries that have the same subspace in one processing. This algorithm combines queries that operate on the same subspace and set of objects with different k values by computing from the smallest to the largest k. Optimization occurs when some data for larger k are precomputed when calculating the result for the smallest k so the Voronoi cell computing is not repeated. Testing is done by comparing with the naïve algorithm without precomputation. IKSR algorithm can speed up computing time two to three times compared to naïve algorithm

    マルチレベル並列化とアプリケーション指向データレイアウトを用いるハードウェアアクセラレータの設計と実装

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 稲葉 雅幸, 東京大学教授 須田 礼仁, 東京大学教授 五十嵐 健夫, 東京大学教授 山西 健司, 東京大学准教授 稲葉 真理, 東京大学講師 中山 英樹University of Tokyo(東京大学

    Effective Space Usage Estimation for Sliding-Window Skybands

    Get PDF
    Skyline query computes all the “best” elements which are not dominated by any other elements and thus is very important for decision-making applications. Recently, it is generalized to skyband query and a k-skyband query returns those elements dominated by no more than k, of other elements. To incorporate the skyband operator into the stream engine for monitoring skybands over sliding windows, space usage estimation for skyband operator becomes a critical issue in the query optimizer. In this paper, we firstly introduce the skyband sketch as the cost model. Based on the cost model, we propose an approach for estimating the space usage of skyband operator over sliding windows of data streams under the assumptions of statistical independence across dimensions, no duplicate values over each dimension, and dimension domains totally ordered. Experiments verify that our approaches can estimate the space usage effectively over arbitrarily distributed data. To the best of our knowledge, this is the first work that attempts to address the issue and proposes effective approaches to solve it

    I/O-Efficient Planar Range Skyline and Attrition Priority Queues

    Full text link
    In the planar range skyline reporting problem, we store a set P of n 2D points in a structure such that, given a query rectangle Q = [a_1, a_2] x [b_1, b_2], the maxima (a.k.a. skyline) of P \cap Q can be reported efficiently. The query is 3-sided if an edge of Q is grounded, giving rise to two variants: top-open (b_2 = \infty) and left-open (a_1 = -\infty) queries. All our results are in external memory under the O(n/B) space budget, for both the static and dynamic settings: * For static P, we give structures that answer top-open queries in O(log_B n + k/B), O(loglog_B U + k/B), and O(1 + k/B) I/Os when the universe is R^2, a U x U grid, and a rank space grid [O(n)]^2, respectively (where k is the number of reported points). The query complexity is optimal in all cases. * We show that the left-open case is harder, such that any linear-size structure must incur \Omega((n/B)^e + k/B) I/Os for a query. We show that this case is as difficult as the general 4-sided queries, for which we give a static structure with the optimal query cost O((n/B)^e + k/B). * We give a dynamic structure that supports top-open queries in O(log_2B^e (n/B) + k/B^1-e) I/Os, and updates in O(log_2B^e (n/B)) I/Os, for any e satisfying 0 \le e \le 1. This leads to a dynamic structure for 4-sided queries with optimal query cost O((n/B)^e + k/B), and amortized update cost O(log (n/B)). As a contribution of independent interest, we propose an I/O-efficient version of the fundamental structure priority queue with attrition (PQA). Our PQA supports FindMin, DeleteMin, and InsertAndAttrite all in O(1) worst case I/Os, and O(1/B) amortized I/Os per operation. We also add the new CatenateAndAttrite operation that catenates two PQAs in O(1) worst case and O(1/B) amortized I/Os. This operation is a non-trivial extension to the classic PQA of Sundar, even in internal memory.Comment: Appeared at PODS 2013, New York, 19 pages, 10 figures. arXiv admin note: text overlap with arXiv:1208.4511, arXiv:1207.234

    Deriving skyline points over dynamic and incomplete databases

    Get PDF
    The rapid growth of data is inevitable, and retrieving the best results that meet the user’s preferences is essential.To achieve this, skylines were introduced in which data items that are not dominated by the other data items in the database are retrieved as results (skylines).In most of the existing skyline approaches, the databases are assumed to be static and complete.However, in real world scenario, databases are not complete especially in multidimensional databases in which some dimensions may have missing values.The databases might also be dynamic in which new data items are inserted while existing data items are deleted or updated.Blindly performing pairwise comparisons on the whole data items after the changes are made is inappropriate as not all data items need to be compared in identifying the skylines. Thus, a novel skyline algorithm, DInSkyline, is proposed in this study which finds the most relevant data items in dynamic and incomplete databases. Several experiments have been conducted and the results show that DInSkyline outperforms the previous works by reducing the number of pairwise comparisons in the range of 52% to 73%
    corecore