J.L.: A Scalable Approach to Learn Semantic Models of Structured Sources

Craig A Knoblock; José Luis Ambite; Mohsen Taheriyan; Pedro Szekely

J.L.: A Scalable Approach to Learn Semantic Models of Structured Sources

Authors: Craig A Knoblock
José Luis Ambite
Mohsen Taheriyan
Pedro Szekely
Publication date: 1 January 2014
Publisher

Abstract

Abstract-Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes. I. INTRODUCTION A significant amount of information available on the Web is available in sources such as relational databases, spreadsheets, XML, JSON, and Web APIs. A common approach to integrate these sources involves building a domain model and constructing source descriptions that represent the intended meaning of the data by specifying mappings between the sources and the domain model Manually constructing semantic models is a timeconsuming task that requires significant effort and expertise. Automatically generating these models involves two steps. The first step is specifying the semantic types, i.e., labeling each data field, or source attribute with a class or a data property of the domain ontology. However, simply annotating the attributes is not sufficient. A precise semantic model needs a second step that specifies the relationships between the source attributes in terms of the properties in the ontology. In Semantic Web research, there are many studies on mapping data sources to ontologies [2]- In our previous work [8], we presented a novel approach to learn semantic models of data sources from known semantic models, semantic models of sources that have already been modeled. The work is inspired by the idea that different sources in the same domain often provide similar or overlapping data and have similar semantic models. Given sample data from the new source, we use an existing machine learnin

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.1044....

Last time updated on 07/12/2020