229 research outputs found
Probabilistic Skyline Queries over Uncertain Moving Objects
Data uncertainty inherently exists in a large number of applications due to factors such as limitations of measuring equipments, update delay, and network bandwidth. Recently, modeling and querying uncertain data have attracted considerable attention from the database community. However, how to perform advanced analysis on uncertain data remains an interesting question. In this paper, we focus on the execution of skyline computation over uncertain moving objects. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline at a certain time point, therefore a p-t-skyline contains those moving objects whose skyline probabilities are at least p at time point t. Computing probabilistic skyline over a large number of uncertain moving objects is a daunting task in practice. In order to efficiently compute the probabilistic skyline query, we propose a discrete-and-conquer strategy, which follows the sampling-bounding-pruning-refining procedure. To further reduce the skyline computation cost, we propose an enhanced framework that is based on a multi-dimensional indexing structure combined with the discrete-and-conquer strategy. Through extensive experiments with synthetic datasets, we show that the framework can efficiently support skyline queries over uncertain moving object and is scalable on large data sets
Threshold interval indexing techniques for complicated uncertain data
Uncertain data is an increasingly prevalent topic in database research, given the advance of instruments which inherently generate uncertainty in their data. In particular, the problem of indexing uncertain data for range queries has received considerable attention. To efficiently process range queries, existing approaches mainly focus on reducing the number of disk I/Os. However, due to the inherent complexity of uncertain data, processing a range query may incur high computational cost in addition to the I/O cost. In this paper, I present a novel indexing strategy focusing on one-dimensional uncertain continuous data, called threshold interval indexing. Threshold interval indexing is able to balance I/O cost and computational cost to achieve an optimal overall query performance. A key ingredient of the proposed indexing structure is a dynamic interval tree. The dynamic interval tree is much more resistant to skew than R-trees, which are widely used in other indexing structures. This interval tree optimizes pruning by storing x-bounds, or pre-calculated probability boundaries, at each node. In addition to the basic threshold interval index, I present two variants, called the strong threshold interval index and the hyper threshold interval index, which leverage x-bounds not only for pruning but also for accepting results. Furthermore, I present a more efficient memory-loaded versions of these indexes, which reduce the storage size so the primary interval tree can be loaded into memory. Each index description includes methods for querying, parallelizing, updating, bulk loading, and externalizing. I perform an extensive set of experiments to demonstrate the effectiveness and efficiency of the proposed indexing strategies
08421 Abstracts Collection -- Uncertainty Management in Information Systems
From October 12 to 17, 2008 the Dagstuhl Seminar 08421 \u27`Uncertainty Management in Information Systems \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. The abstracts of the plenary and session talks given during the seminar as well as those of the shown demos are put together in this paper
RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques
People spend a significant amount of time in indoor spaces (e.g., office
buildings, subway systems, etc.) in their daily lives. Therefore, it is
important to develop efficient indoor spatial query algorithms for supporting
various location-based applications. However, indoor spaces differ from outdoor
spaces because users have to follow the indoor floor plan for their movements.
In addition, positioning in indoor environments is mainly based on sensing
devices (e.g., RFID readers) rather than GPS devices. Consequently, we cannot
apply existing spatial query evaluation techniques devised for outdoor
environments for this new challenge. Because Bayesian filtering techniques can
be employed to estimate the state of a system that changes over time using a
sequence of noisy measurements made on the system, in this research, we propose
the Bayesian filtering-based location inference methods as the basis for
evaluating indoor spatial queries with noisy RFID raw data. Furthermore, two
novel models, indoor walking graph model and anchor point indexing model, are
created for tracking object locations in indoor environments. Based on the
inference method and tracking models, we develop innovative indoor range and k
nearest neighbor (kNN) query algorithms. We validate our solution through use
of both synthetic data and real-world data. Our experimental results show that
the proposed algorithms can evaluate indoor spatial queries effectively and
efficiently. We open-source the code, data, and floor plan at
https://github.com/DataScienceLab18/IndoorToolKit
Distributed Indexing Schemes for k-Dominant Skyline Analytics on Uncertain Edge-IoT Data
Skyline queries typically search a Pareto-optimal set from a given data set
to solve the corresponding multiobjective optimization problem. As the number
of criteria increases, the skyline presumes excessive data items, which yield a
meaningless result. To address this curse of dimensionality, we proposed a
k-dominant skyline in which the number of skyline members was reduced by
relaxing the restriction on the number of dimensions, considering the
uncertainty of data. Specifically, each data item was associated with a
probability of appearance, which represented the probability of becoming a
member of the k-dominant skyline. As data items appear continuously in data
streams, the corresponding k-dominant skyline may vary with time. Therefore, an
effective and rapid mechanism of updating the k-dominant skyline becomes
crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI)
and All Indexing (AI), for k-dominant skyline in distributed edge-computing
environments, where irrelevant data items can be effectively excluded from the
compute to reduce the processing duration. Furthermore, the proposed schemes
were validated with extensive experimental simulations. The experimental
results demonstrated that the proposed MI and AI schemes reduced the
computation time by approximately 13% and 56%, respectively, compared with the
existing method.Comment: 13 pages, 8 figures, 12 tables, to appear in IEEE Transactions on
Emerging Topics in Computin
- …