Efficient Discovery of Ontology Functional Dependencies

Baskaran, Sridevi; Chiang, Fei; Keller, Alexander; Lukasz, Golab; Szlichta, Jaroslaw

research

Efficient Discovery of Ontology Functional Dependencies

Authors: Sridevi Baskaran
Fei Chiang
Alexander Keller
Golab Lukasz
Jaroslaw Szlichta
Publication date: 23 May 2017
Publisher
Doi

Abstract

Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

Similar works

Full text

Available Versions

Crossref

info:doi/10.1145%2F3132847.313...

Last time updated on 01/04/2019