8 research outputs found
Skyline computation of uncertain database: A survey
Conducting advance skyline analysis over certain and
uncertain databases is still an evolving research area in the field of database, despite several research works that have been conducted in this area. This paper conducts a survey on research issues on computing skyline for uncertain databases, with the view of providing interested researchers with
an overview of the most recent research directions in this area.It further suggests possible research direction on skyline processing for uncertain
databases.Taxonomy of the existing approaches is also presented
Computing All Restricted Skyline Probabilities on Uncertain Datasets
Restricted skyline (rskyline) query is widely used in multi-criteria decision
making. It generalizes the skyline query by additionally considering a set of
personalized scoring functions F. Since uncertainty is inherent in datasets for
multi-criteria decision making, we study rskyline queries on uncertain datasets
from both complexity and algorithm perspective. We formalize the problem of
computing rskyline probabilities of all data items and show that no algorithm
can solve this problem in truly subquadratic-time, unless the orthogonal
vectors conjecture fails. Considering that linear scoring functions are widely
used in practical applications, we propose two efficient algorithms for the
case where \calF is a set of linear scoring functions whose weights are
described by linear constraints, one with near-optimal time complexity and the
other with better expected time complexity. For special linear constraints
involving a series of weight ratios, we further devise an algorithm with
sublinear query time and polynomial preprocessing time. Extensive experiments
demonstrate the effectiveness, efficiency, scalability, and usefulness of our
proposed algorithms.Comment: Full version, a shorter version to appear in ICDE 202
Finding Probabilistic k-Skyline Sets on Uncertain Data
ABSTRACT Skyline is a set of points that are not dominated by any other point. Given uncertain objects, probabilistic skyline has been studied which computes objects with high probability of being skyline. While useful for selecting individual objects, it is not sufficient for scenarios where we wish to compute a subset of skyline objects, i.e., a skyline set. In this paper, we generalize the notion of probabilistic skyline to probabilistic k-skyline sets (Pk-SkylineSets) which computes k-object sets with high probability of being skyline set. We present an efficient algorithm for computing probabilistic k-skyline sets. It uses two heuristic pruning strategies and a novel data structure based on the classic layered range tree to compute the skyline set probability for each instance set with a worst-case time bound. The experimental results on the real NBA dataset and the synthetic datasets show that Pk-SkylineSets is interesting and useful, and our algorithms are efficient and scalable
Algorithms and hardness results for geometric problems on stochastic datasets
University of Minnesota Ph.D. dissertation.July 2019. Major: Computer Science. Advisor: Ravi Janardan. 1 computer file (PDF); viii, 121 pages.Traditionally, geometric problems are studied on datasets in which each data object exists with probability 1 at its location in the underlying space. However, in many scenarios, there may be some uncertainty associated with the existence or the locations of the data points. Such uncertain datasets, called \textit{stochastic datasets}, are often more realistic, as they are more expressive and can model the real data more precisely. For this reason, geometric problems on stochastic datasets have received significant attention in recent years. This thesis studies three sets of geometric problems on stochastic datasets equipped with existential uncertainty. The first set of problems addresses the linear separability of a bichromatic stochastic dataset. Specifically, these problems are concerned with how to compute the probability that a realization of a bichromatic stochastic dataset is linearly separable as well as how to compute the expected separation-margin of such a realization. The second set of problems deals with the stochastic convex hull, i.e., the convex hull of a stochastic dataset. This includes computing the expected measures of a stochastic convex hull, such as the expected diameter, width, and combinatorial complexity. The third set of problems considers the dominance relation in a colored stochastic dataset. These problems involve computing the probability that a realization of a colored stochastic dataset does not contain any dominance pair consisting of two different-colored points. New algorithmic and hardness results are provided for the three sets of problems
(Approximate) uncertain skylines
Given a set of points with uncertain locations, we consider the problem of computing the probability of each point lying on the skyline, that is, the probability that it is not dominated by any other input point. If each point’s uncertainty is described as a probability distribution over a discrete set of locations, we improve the best known exact solution. We also suggest why we believe our solution might be optimal. Next, we describe simple, near-linear time approximation algorithms for computing the probability of each point lying on the skyline. In addition, some of our methods can be adapted to construct data structures that can efficiently determine the probability of a query point lying on the skyline