Search CORE

Discovering Conditional Functional Dependencies

Author: Fan Wenfei
Geerts Floris
Li Jianzhong
Xiong Ming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

A Uniform Dependency Language for Improving Data Quality

Author: Fan Wenfei
Geerts Floris
Publication venue
Publication date: 01/01/2011
Field of study

arXiv.org e-Print Archive

Efficient Discovery of Ontology Functional Dependencies

Author: Baskaran Sridevi
Chiang Fei
Keller Alexander
Lukasz Golab
Szlichta Jaroslaw
Publication venue
Publication date: 23/05/2017
Field of study

Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

A revival of integrity constraints for data cleaning

Author: Fan Wenfei
Geerts Floris
Jia Xibei
Publication venue
Publication date: 01/01/2008
Field of study

Integrity constraints, a.k.a . data dependencies, are being widely used for improving the quality of schema . Recently constraints have enjoyed a revival for improving the quality of data . The tutorial aims to provide an overview of recent advances in constraint-based data cleaning. </jats:p