Data curation with ontology functional dependences
Date
2017-04-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Functional
dependencies have been used in existing cleaning solutions to model syntactic equivalence. They are not able to model
semantic equivelence, however. We advance the state of data quality constraints by defining, discovering, and cleaning
Ontology Functional Dependencies. We define their theoretical foundations, including sound and complete axioms, and
linear inference procedure. We develop algorithms for data verification, constraint discovery, data cleaning, ontology
versus data inconsistency identification, and optimizations to each. Our experimental evaluation shows the scalability and
accuracy of our algorithms. We show that ontology FDs are useful to capture domain attribute relationships, and can
significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional FDs.
Description
Keywords
Constraints, Data, Quality, Cleaning, Discovery