Search CORE

11 research outputs found

Handling instance coreferencing in the KnoFuss architecture

Author: De Roeck Anne
Motta Enrico
Nikolov Andriy
Uren Victoria
Publication venue
Publication date: 01/01/2008
Field of study

Finding RDF individuals that refer to the same real-world entities but have different URIs is necessary for the efficient use of data across sources. The requirements for such instance-level integration of RDF data are different from both database record linkage and ontology schema matching scenarios. Flexible configuration and reuse of different methods is needed to achieve good performance. Our data integration architecture, called KnoFuss, implements a component-based approach, which allows flexible selection and tuning of methods and takes the ontological schemata into account to improve the reusability of methods

CiteSeerX

Open Research Online (The Open University)

Fusing Automatically Extracted Annotations for the Semantic Web

Author: Nikolov Andriy
Publication venue
Publication date: 01/01/2010
Field of study

This research focuses on the problem of semantic data fusion. Although various solutions have been developed in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be reusable, the fusion system must be able to select appropriate techniques and use them in combination. Moreover, because of the varying reliability of data sources and algorithms performing fusion subtasks, uncertainty is an inherent feature of semantically annotated data and has to be taken into account by the fusion system. Finally, the issue of schema heterogeneity can have a negative impact on the fusion performance. To address these issues, we propose KnoFuss: an architecture for Semantic Web data integration based on the principles of problem-solving methods. Algorithms dealing with different fusion subtasks are represented as components of a modular architecture, and their capabilities are described formally. This allows the architecture to select appropriate methods and configure them depending on the processed data. In order to handle uncertainty, we propose a novel algorithm based on the Dempster-Shafer belief propagation. KnoFuss employs this algorithm to reason about uncertain data and method results in order to refine the fused knowledge base. Tests show that these solutions lead to improved fusion performance. Finally, we addressed the problem of data fusion in the presence of schema heterogeneity. We extended the KnoFuss framework to exploit results of automatic schema alignment tools and proposed our own schema matching algorithm aimed at facilitating data fusion in the Linked Data environment. We conducted experiments with this approach and obtained a substantial improvement in performance in comparison with public data repositories

CiteSeerX

Open Research Online (The Open University)

OpenGrey Repository

MeLinDa: an interlinking framework for the web of data

Author: Euzenat Jérôme
Scharffe François
Publication venue
Publication date: 21/07/2011
Field of study

The web of data consists of data published on the web in such a way that they can be interpreted and connected together. It is thus critical to establish links between these data, both for the web of data and for the semantic web that it contributes to feed. We consider here the various techniques developed for that purpose and analyze their commonalities and differences. We propose a general framework and show how the diverse techniques fit in the framework. From this framework we consider the relation between data interlinking and ontology matching. Although, they can be considered similar at a certain level (they both relate formal entities), they serve different purposes, but would find a mutual benefit at collaborating. We thus present a scheme under which it is possible for data linking tools to take advantage of ontology alignments.Comment: N° RR-7691 (2011

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Méthodes et outils pour lier le web des données

Author: Euzenat Jérôme
Scharffe François
Publication venue: No commercial editor.
Publication date: 19/01/2010
Field of study

scharffe2010aNational audienceLe web des données consiste à publier des données sur le web de telle sorte qu'elles puissent être interprétées et connectées entre elles. Il est donc vital d'établir les liens entre ces données à la fois pour le web des données et pour le web sémantique qu'il contribue à nourrir. Nous proposons un cadre général dans lequel s'inscrivent les différentes techniques utilisées pour établir ces liens et nous montrons comment elles s'y insèrent. Nous proposons ensuite une architecture permettant d'associer les différents systèmes de liage de données et de les faire collaborer avec les systèmes développés pour la mise en correspondance d'ontologies qui présente de nombreux points communs avec la découverte de liens

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Cross-lingual knowledge linking across wiki knowledge bases

Author: Jie Tang
Juanzi Li
Zhichun Wang
Zhigang Wang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Wikipedia becomes one of the largest knowledge bases on the Web. It has attracted 513 million page views per day in January 2012. However, one critical issue for Wikipedia is that articles in different language are very unbalanced. For example, the number of articles on Wikipedia in English has reached 3.8 million, while the number of Chinese articles is still less than half million and there are only 217 thousand cross-lingual links between articles of the two languages. On the other hand, there are more than 3.9 million Chinese Wi-ki articles on Baidu Baike and Hudong.com, two popular encyclopedias in Chinese. One important question is how to link the knowledge entries distributed in different knowledge bases. This will immensely enrich the information in the on-line knowledge bases and benefit many applications. In this paper, we study the problem of cross-lingual knowledge link-ing and present a linkage factor graph model. Features are defined according to some interesting observations. Exper-iments on the Wikipedia data set show that our approach can achieve a high precision of 85.8 % with a recall of 88.1%. The approach found 202,141 new cross-lingual links between English Wikipedia and Baidu Baike

CiteSeerX

Crossref

Détection de clefs pour l'interconnexion et le nettoyage de jeux de données

Author: David Jérôme
Scharffe François
Publication venue: HAL CCSD
Publication date: 25/06/2012
Field of study

International audienceCet article propose une méthode d'analyse de jeux de données du Web publiés en RDF basée sur les dépendances de clefs. Ce type particulier de dépendances fonctionnelles, largement étudié dans la théorie des bases de données, permet d'évaluer si un ensemble de propriétés constitue une clef pour l'ensemble de données considéré. Si c'est le cas, il n'y aura alors pas deux instances possédant les mêmes valeurs pour ces propriétés. Après avoir donné les définitions nécessaires, nous proposons un algorithme de détection des clefs minimales sur un jeu de données RDF. Nous utilisons ensuite cet algorithme pour détecter les clefs de plusieurs jeux de données publiées sur le Web et appliquons notre approche pour deux applications : (1) réduire le nombre de propriétés à comparer dans le but de détecter des ressources identiques entre deux jeux de données, et (2) détecter des erreurs à l'intérieur d'un jeu de données

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

SEMANTIC DATA CLOUDING OVER THE WEBS

Author: G. Varese
Publication venue: Universit\ue0 degli Studi di Milano
Publication date: 06/03/2012
Field of study

Very often, for business or personal needs, users require to retrieve, in a very fast way, all the available relevant information about a focused target entity, in order to take decisions, organize business work, plan future actions. To answer this kind of \u201centity\u201d- driven user needs, a huge multiplicity of web resources is actually available, coming from the Social Web and related user-centered services (e.g., news publishing, social networks, microblogging systems), from the Semantic Web and related ontologies and knowledge repositories, and from the conventional Web of Documents. The Ph.D. thesis is devoted to define the notion of in-cloud and a semantic clouding approach for the construction of in-clouds that works over the Social Web, the Semantic Web, and the Web of Documents. in-clouds are built for a target entity of interest to organize all relevant web resources, modeled as web data items, into a graph, on the basis of their level of prominence and reciprocal closeness. Prominence captures the importance of a web resource within the in-cloud, by distinguishing, also in a visual way \u201ca la tagcloud\u201d, how much relevant web resources are with respect to the target entity. The level of closeness between web resources is evaluated using matching and clustering techniques, with the goal of determining how similar web resources are to each other and with respect to the target entity

AIR Universita degli studi di Milano