4 research outputs found
Uncertainty in Automated Ontology Matching: Lessons Learned from an Empirical Experimentation
Data integration is considered a classic research field and a pressing need
within the information science community. Ontologies play a critical role in
such a process by providing well-consolidated support to link and semantically
integrate datasets via interoperability. This paper approaches data integration
from an application perspective, looking at techniques based on ontology
matching. An ontology-based process may only be considered adequate by assuming
manual matching of different sources of information. However, since the
approach becomes unrealistic once the system scales up, automation of the
matching process becomes a compelling need. Therefore, we have conducted
experiments on actual data with the support of existing tools for automatic
ontology matching from the scientific community. Even considering a relatively
simple case study (i.e., the spatio-temporal alignment of global indicators),
outcomes clearly show significant uncertainty resulting from errors and
inaccuracies along the automated matching process. More concretely, this paper
aims to test on real-world data a bottom-up knowledge-building approach,
discuss the lessons learned from the experimental results of the case study,
and draw conclusions about uncertainty and uncertainty management in an
automated ontology matching process. While the most common evaluation metrics
clearly demonstrate the unreliability of fully automated matching solutions,
properly designed semi-supervised approaches seem to be mature for a more
generalized application
Automatic generation of probabilistic relationships for improving schema matching
Schema matching is the problem of finding relationships among concepts across data sources that are heterogeneous in format and in structure. Starting from the ‘‘hidden meaning’’ associated with schema labels (i.e.class/attribute names), it is possible to discover lexical relationships among the elements of different schemata. In this work, we propose an automatic method aimed at discovering probabilistic lexical relationships in the environment of data integration ‘‘on the fly’’. Our method is based on a probabilistic lexical annotation technique, which automatically associates one or more meanings with schema elements w.r.t. a thesaurus/ lexical resource. However, the accuracy of automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and abbreviations.We address this problem by including a method to perform schema label normalization which increases the number of comparable labels. From the annotated schemata, we derive the probabilistic lexical relationships to be collected in the Probabilistic CommonThesaurus. The method is applied within the MOMIS data integration system but can easily be generalized to other data integration systems