1 research outputs found
Complexity and Efficient Algorithms for Data Inconsistency Evaluating and Repairing
Data inconsistency evaluating and repairing are major concerns in data
quality management. As the basic computing task, optimal subset repair is not
only applied for cost estimation during the progress of database repairing, but
also directly used to derive the evaluation of database inconsistency.
Computing an optimal subset repair is to find a minimum tuple set from an
inconsistent database whose remove results in a consistent subset left. Tight
bound on the complexity and efficient algorithms are still unknown. In this
paper, we improve the existing complexity and algorithmic results, together
with a fast estimation on the size of optimal subset repair. We first
strengthen the dichotomy for optimal subset repair computation problem, we show
that it is not only APXcomplete, but also NPhard to approximate an optimal
subset repair with a factor better than for most cases. We second show
a -approximation whenever given functional
dependencies, and a -approximation when an
-portion of tuples have the -quasi-Turn property
for some . We finally show a sublinear estimator on the size of optimal
\textit{S}-repair for subset queries, it outputs an estimation of a ratio
with a high probability, thus deriving an estimation of
FD-inconsistency degree of a ratio . To support a variety of subset
queries for FD-inconsistency evaluation, we unify them as the
-oracle which can answer membership-query, and return tuples
uniformly sampled whenever given a number . Experiments are conducted on
range queries as an implementation of -oracle, and results show the
efficiency of our FD-inconsistency degree estimator