25,511 research outputs found

    Information Integration - the process of integration, evolution and versioning

    Get PDF
    At present, many information sources are available wherever you are. Most of the time, the information needed is spread across several of those information sources. Gathering this information is a tedious and time consuming job. Automating this process would assist the user in its task. Integration of the information sources provides a global information source with all information needed present. All of these information sources also change over time. With each change of the information source, the schema of this source can be changed as well. The data contained in the information source, however, cannot be changed every time, due to the huge amount of data that would have to be converted in order to conform to the most recent schema.\ud In this report we describe the current methods to information integration, evolution and versioning. We distinguish between integration of schemas and integration of the actual data. We also show some key issues when integrating XML data sources

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Full text link
    Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification and enforcement of MDs.Comment: Final journal version, with some minor technical corrections. Extended version of arXiv:1508.0601

    A formal foundation for ontology alignment interaction models

    No full text
    Ontology alignment foundations are hard to find in the literature. The abstract nature of the topic and the diverse means of practice makes it difficult to capture it in a universal formal foundation. We argue that such a lack of formality hinders further development and convergence of practices, and in particular, prevents us from achieving greater levels of automation. In this article we present a formal foundation for ontology alignment that is based on interaction models between heterogeneous agents on the Semantic Web. We use the mathematical notion of information flow in a distributed system to ground our three hypotheses of enabling semantic interoperability and we use a motivating example throughout the article: how to progressively align two ontologies of research quality assessment through meaning coordination. We conclude the article with the presentation---in an executable specification language---of such an ontology-alignment interaction model
    corecore