Search CORE

2 research outputs found

Cleaning uncertain data for top-k queries

Author: Cheng R
Cheung DWL
Li X
Mo L
Yang XS
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this 'cleaning operation' may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal. © 2013 IEEE.published_or_final_versio

CiteSeerX

HKU Scholars Hub

Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty

Author: Junghoo Cho
Ka Cheung Sia
Zhenyu Liu
Publication venue
Publication date: 01/01/2005
Field of study

The rapid development in micro-sensors and wireless networks has made large-scale sensor networks possible. However, the wide deployment of such systems is still hindered by their limited energy which quickly runs out in case of massive communication. In this paper, we study the cost-efficient processing of aggregate queries that are generally communication-intensive. In particular, we focus on MIN/MAX queries that require both identity and value in the answer. We study how to provide an error bound to such answers, and how to design an "optimal" sensor-contact policy that minimizes communication cost in reducing the error to a user-tolerable level

CiteSeerX

Crossref