Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling

Abstract

We devise an enumeration method for inclusion-wise minimal hitting sets in hypergraphs. It has delay O(mk* +1 · n2) and uses linear space. Hereby, n is the number of vertices, m the number of hyperedges, and k* the rank of the transversal hypergraph. In particular, on classes of hypergraphs for which the cardinality k* of the largest minimal hitting set is bounded, the delay is polynomial. The algorithm solves the extension problem for minimal hitting sets as a subroutine. We show that the extension problem is W[3]-complete when parameterised by the cardinality of the set which is to be extended. For the subroutine, we give an algorithm that is optimal under the exponential time hypothesis. Despite these lower bounds, we provide empirical evidence showing that the enumeration outperforms the theoretical worst-case guarantee on hypergraphs arising in the profiling of relational databases, namely, in the detection of unique column combinations

    Similar works