Data curation with ontology functional dependences

Date
2017-04-01
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Functional dependencies have been used in existing cleaning solutions to model syntactic equivalence. They are not able to model semantic equivelence, however. We advance the state of data quality constraints by defining, discovering, and cleaning Ontology Functional Dependencies. We define their theoretical foundations, including sound and complete axioms, and linear inference procedure. We develop algorithms for data verification, constraint discovery, data cleaning, ontology versus data inconsistency identification, and optimizations to each. Our experimental evaluation shows the scalability and accuracy of our algorithms. We show that ontology FDs are useful to capture domain attribute relationships, and can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional FDs.
Description
Keywords
Constraints, Data, Quality, Cleaning, Discovery
Citation