1 research outputs found
Discovering Multi-Table Functional Dependencies Without Full Join Computation
In this paper, we study the problem of discovering join FDs, i.e., functional
dependencies (FDs) that hold on multiple joined tables. We leverage logical
inference, selective mining, and sampling and show that we can discover most of
the exact join FDs from the single tables participating to the join and avoid
the full computation of the join result. We propose algorithms to speed-up the
join FD discovery process and mine FDs on the fly only from necessary data
partitions. We introduce JEDI (Join dEpendency DIscovery), our solution to
discover join FDs without computation of the full join beforehand. Our
experiments on a range of real-world and synthetic data demonstrate the
benefits of our method over existing FD discovery methods that need to
precompute the join results before discovering the FDs. We show that the
performance depends on the cardinalities and coverage of the join attribute
values: for join operations with low coverage, JEDI with selective mining
outperforms the competing methods using the straightforward approach of full
join computation by one order of magnitude in terms of runtime and can discover
three-quarters of the exact join FDs using mainly logical inference in half of
its total execution time on average. For higher join coverage, JEDI with
sampling reaches precision of 1 with only 63% of the table input size on
average