179 research outputs found
Distributed Indexing Schemes for k-Dominant Skyline Analytics on Uncertain Edge-IoT Data
Skyline queries typically search a Pareto-optimal set from a given data set
to solve the corresponding multiobjective optimization problem. As the number
of criteria increases, the skyline presumes excessive data items, which yield a
meaningless result. To address this curse of dimensionality, we proposed a
k-dominant skyline in which the number of skyline members was reduced by
relaxing the restriction on the number of dimensions, considering the
uncertainty of data. Specifically, each data item was associated with a
probability of appearance, which represented the probability of becoming a
member of the k-dominant skyline. As data items appear continuously in data
streams, the corresponding k-dominant skyline may vary with time. Therefore, an
effective and rapid mechanism of updating the k-dominant skyline becomes
crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI)
and All Indexing (AI), for k-dominant skyline in distributed edge-computing
environments, where irrelevant data items can be effectively excluded from the
compute to reduce the processing duration. Furthermore, the proposed schemes
were validated with extensive experimental simulations. The experimental
results demonstrated that the proposed MI and AI schemes reduced the
computation time by approximately 13% and 56%, respectively, compared with the
existing method.Comment: 13 pages, 8 figures, 12 tables, to appear in IEEE Transactions on
Emerging Topics in Computin
Reporting Skyline on Uncertain Dimension with Query Interval
Naturally, users sometimes specify their preference in an imprecise way (i.e. query with an interval/range). To report results that satisfy the imprecise query as well as interesting would be easy on dataset with atomic values. The challenge is when the dataset being queried consists of both atomic values as well as continuous range of values. For a set of objects with uncertain dimension and given a query interval
A systematic literature review of skyline query processing over data stream
Recently, skyline query processing over data stream has gained a lot of attention especially from the database community owing to its own unique challenges. Skyline queries aims at pruning a search space of a potential large multi-dimensional set of objects by keeping only those objects that are not worse than any other. Although an abundance of skyline query processing techniques have been proposed, there is a lack of a Systematic Literature Review (SLR) on current research works pertinent to skyline query processing over data stream. In regard to this, this paper provides a comparative study on the state-of-the-art approaches over the period between 2000 and 2022 with the main aim to help readers understand the key issues which are essential to consider in relation to processing skyline queries over streaming data. Seven digital databases were reviewed in accordance with the Preferred Reporting Items for Systematic Reviews (PRISMA) procedures. After applying both the inclusion and exclusion criteria, 23 primary papers were further examined. The results show that the identified skyline approaches are driven by the need to expedite the skyline query processing mainly due to the fact that data streams are time varying (time sensitive), continuous, real time, volatile, and unrepeatable. Although, these skyline approaches are tailored made for data stream with a common aim, their solutions vary to suit with the various aspects being considered, which include the type of skyline query, type of streaming data, type of sliding window, query processing technique, indexing technique as well as the data stream environment employed. In this paper, a comprehensive taxonomy is developed along with the key aspects of each reported approach, while several open issues and challenges related to the topic being reviewed are highlighted as recommendation for future research direction
Reverse Skyline Computation over Sliding Windows
Reverse skyline queries have been used in many real-world applications such as business planning, market analysis, and environmental monitoring. In this paper, we investigated how to efficiently evaluate continuous reverse skyline queries over sliding windows. We first theoretically analyzed the inherent properties of reverse skyline on data streams and proposed a novel pruning technique to reduce the number of data points preserved for processing continuous reverse skyline queries. Then, an efficient approach, called Semidominance Based Reverse Skyline (SDRS), was proposed to process continuous reverse skyline queries. Moreover, an extension was also proposed to handle n-of-N and (n1,n2)-of-N reverse skyline queries. Our extensive experimental studies have demonstrated the efficiency as well as effectiveness of the proposed approach with various experimental settings
Sliding windows over uncertain data streams
Uncertain data streams can have tuples with both value and existential uncertainty. A tuple has value uncertainty when it can assume multiple possible values. A tuple is existentially uncertain when the sum of the probabilities of its possible values is <1. A situation where existential uncertainty can arise is when applying relational operators to streams with value uncertainty. Several prior works have focused on querying and mining data streams with both value and existential uncertainty. However, none of them have studied, in depth, the implications of existential uncertainty on sliding window processing, even though it naturally arises when processing uncertain data. In this work, we study the challenges arising from existential uncertainty, more specifically the management of count-based sliding windows, which are a basic building block of stream processing applications. We extend the semantics of sliding window to define the novel concept of uncertain sliding windows and provide both exact and approximate algorithms for managing windows under existential uncertainty. We also show how current state-of-the-art techniques for answering similarity join queries can be easily adapted to be used with uncertain sliding windows. We evaluate our proposed techniques under a variety of configurations using real data. The results show that the algorithms used to maintain uncertain sliding windows can efficiently operate while providing a high-quality approximation in query answering. In addition, we show that sort-based similarity join algorithms can perform better than index-based techniques (on 17 real datasets) when the number of possible values per tuple is low, as in many real-world applications. © 2014, Springer-Verlag London
Effective Space Usage Estimation for Sliding-Window Skybands
Skyline query computes all the “best” elements which are
not dominated by any other elements and thus is very important
for decision-making applications. Recently, it is generalized
to skyband query and a k-skyband query returns
those elements dominated by no more than k, of other elements.
To incorporate the skyband operator into the stream engine
for monitoring skybands over sliding windows, space usage
estimation for skyband operator becomes a critical issue in
the query optimizer. In this paper, we firstly introduce the
skyband sketch as the cost model. Based on the cost model,
we propose an approach for estimating the space usage of
skyband operator over sliding windows of data streams under
the assumptions of statistical independence across dimensions,
no duplicate values over each dimension, and dimension
domains totally ordered. Experiments verify that
our approaches can estimate the space usage effectively over
arbitrarily distributed data. To the best of our knowledge,
this is the first work that attempts to address the issue and
proposes effective approaches to solve it
- …