J.L.: A Scalable Approach to Learn Semantic Models of Structured Sources

Abstract

Abstract-Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes. I. INTRODUCTION A significant amount of information available on the Web is available in sources such as relational databases, spreadsheets, XML, JSON, and Web APIs. A common approach to integrate these sources involves building a domain model and constructing source descriptions that represent the intended meaning of the data by specifying mappings between the sources and the domain model Manually constructing semantic models is a timeconsuming task that requires significant effort and expertise. Automatically generating these models involves two steps. The first step is specifying the semantic types, i.e., labeling each data field, or source attribute with a class or a data property of the domain ontology. However, simply annotating the attributes is not sufficient. A precise semantic model needs a second step that specifies the relationships between the source attributes in terms of the properties in the ontology. In Semantic Web research, there are many studies on mapping data sources to ontologies [2]- In our previous work [8], we presented a novel approach to learn semantic models of data sources from known semantic models, semantic models of sources that have already been modeled. The work is inspired by the idea that different sources in the same domain often provide similar or overlapping data and have similar semantic models. Given sample data from the new source, we use an existing machine learnin

    Similar works

    Full text

    thumbnail-image

    Available Versions