137 research outputs found
Knowledge Refinement via Rule Selection
In several different applications, including data transformation and entity
resolution, rules are used to capture aspects of knowledge about the
application at hand. Often, a large set of such rules is generated
automatically or semi-automatically, and the challenge is to refine the
encapsulated knowledge by selecting a subset of rules based on the expected
operational behavior of the rules on available data. In this paper, we carry
out a systematic complexity-theoretic investigation of the following rule
selection problem: given a set of rules specified by Horn formulas, and a pair
of an input database and an output database, find a subset of the rules that
minimizes the total error, that is, the number of false positive and false
negative errors arising from the selected rules. We first establish
computational hardness results for the decision problems underlying this
minimization problem, as well as upper and lower bounds for its
approximability. We then investigate a bi-objective optimization version of the
rule selection problem in which both the total error and the size of the
selected rules are taken into account. We show that testing for membership in
the Pareto front of this bi-objective optimization problem is DP-complete.
Finally, we show that a similar DP-completeness result holds for a bi-level
optimization version of the rule selection problem, where one minimizes first
the total error and then the size
Structure and Complexity of Bag Consistency
Since the early days of relational databases, it was realized that acyclic
hypergraphs give rise to database schemas with desirable structural and
algorithmic properties. In a by-now classical paper, Beeri, Fagin, Maier, and
Yannakakis established several different equivalent characterizations of
acyclicity; in particular, they showed that the sets of attributes of a schema
form an acyclic hypergraph if and only if the local-to-global consistency
property for relations over that schema holds, which means that every
collection of pairwise consistent relations over the schema is globally
consistent. Even though real-life databases consist of bags (multisets), there
has not been a study of the interplay between local consistency and global
consistency for bags. We embark on such a study here and we first show that the
sets of attributes of a schema form an acyclic hypergraph if and only if the
local-to global consistency property for bags over that schema holds. After
this, we explore algorithmic aspects of global consistency for bags by
analyzing the computational complexity of the global consistency problem for
bags: given a collection of bags, are these bags globally consistent? We show
that this problem is in NP, even when the schema is part of the input. We then
establish the following dichotomy theorem for fixed schemas: if the schema is
acyclic, then the global consistency problem for bags is solvable in polynomial
time, while if the schema is cyclic, then the global consistency problem for
bags is NP-complete. The latter result contrasts sharply with the state of
affairs for relations, where, for each fixed schema, the global consistency
problem for relations is solvable in polynomial time
Universal Solutions in Temporal Data Exchange
During the past fifteen years, data exchange has been explored in depth and in a variety of different settings. Even though temporal databases constitute a mature area of research studied over several decades, the investigation of temporal data exchange was initiated only very recently. We analyze the properties of universal solutions in temporal data exchange with emphasis on the relationship between universal solutions in the context of concrete time and universal solutions in the context of abstract time. We show that challenges arise even in the setting in which the data exchange specifications involve a single temporal variable. After this, we identify settings, including data exchange settings that involve multiple temporal variables, in which these challenges can be overcome
The Connectivity of Boolean Satisfiability: Computational and Structural Dichotomies
Boolean satisfiability problems are an important benchmark for questions
about complexity, algorithms, heuristics and threshold phenomena. Recent work
on heuristics, and the satisfiability threshold has centered around the
structure and connectivity of the solution space. Motivated by this work, we
study structural and connectivity-related properties of the space of solutions
of Boolean satisfiability problems and establish various dichotomies in
Schaefer's framework.
On the structural side, we obtain dichotomies for the kinds of subgraphs of
the hypercube that can be induced by the solutions of Boolean formulas, as well
as for the diameter of the connected components of the solution space. On the
computational side, we establish dichotomy theorems for the complexity of the
connectivity and st-connectivity questions for the graph of solutions of
Boolean formulas. Our results assert that the intractable side of the
computational dichotomies is PSPACE-complete, while the tractable side - which
includes but is not limited to all problems with polynomial time algorithms for
satisfiability - is in P for the st-connectivity question, and in coNP for the
connectivity question. The diameter of components can be exponential for the
PSPACE-complete cases, whereas in all other cases it is linear; thus, small
diameter and tractability of the connectivity problems are remarkably aligned.
The crux of our results is an expressibility theorem showing that in the
tractable cases, the subgraphs induced by the solution space possess certain
good structural properties, whereas in the intractable cases, the subgraphs can
be arbitrary
- …