3 research outputs found
TruthDiscover: Resolving Object Conflicts on Massive Linked Data
Considerable effort has been made to increase the scale of Linked Data.
However, because of the openness of the Semantic Web and the ease of extracting
Linked Data from semi-structured sources (e.g., Wikipedia) and unstructured
sources, many Linked Data sources often provide conflicting objects for a
certain predicate of a real-world entity. Existing methods cannot be trivially
extended to resolve conflicts in Linked Data because Linked Data has a
scale-free property. In this demonstration, we present a novel system called
TruthDiscover, to identify the truth in Linked Data with a scale-free property.
First, TruthDiscover leverages the topological properties of the Source Belief
Graph to estimate the priori beliefs of sources, which are utilized to smooth
the trustworthiness of sources. Second, the Hidden Markov Random Field is
utilized to model interdependencies among objects for estimating the trust
values of objects accurately. TruthDiscover can visualize the process of
resolving conflicts in Linked Data. Experiments results on four datasets show
that TruthDiscover exhibits satisfactory accuracy when confronted with data
having a scale-free property.Comment: This paper had been accepted by Proceedings of the 26th International
Conference on World Wide Web Companion. International World Wide Web
Conferences Steering Committee, 2017, WWW201
Truth Discovery to Resolve Object Conflicts in Linked Data
In the community of Linked Data, anyone can publish their data as Linked Data
on the web because of the openness of the Semantic Web. As such, RDF (Resource
Description Framework) triples described the same real-world entity can be
obtained from multiple sources; it inevitably results in conflicting objects
for a certain predicate of a real-world entity. The objective of this study is
to identify one truth from multiple conflicting objects for a certain predicate
of a real-world entity. An intuitive principle based on common sense is that an
object from a reliable source is trustworthy; thus, a source that provide
trustworthy object is reliable. Many truth discovery methods based on this
principle have been proposed to estimate source reliability and identify the
truth. However, the effectiveness of existing truth discovery methods is
significantly affected by the number of objects provided by each source.
Therefore, these methods cannot be trivially extended to resolve conflicts in
Linked Data with a scale-free property, i.e., most of the sources provide few
conflicting objects, whereas only a few sources have many conflicting objects.
To address this challenge, we propose a novel approach called TruthDiscover to
identify the truth in Linked Data with a scale-free property. Two strategies
are adopted in TruthDiscover to reduce the effect of the scale-free property on
truth discovery. First, this approach leverages the topological properties of
the Source Belief Graph to estimate the priori beliefs of sources, which are
utilized to smooth the trustworthiness of sources. Second, this approach
utilizes the Hidden Markov Random Field to model the interdependencies between
objects to estimate the trust values of objects accurately. Experiments are
conducted in the six datasets to evaluate TruthDiscover.Comment: Have many crucial faults in this versio
A new approach for interlinking and integrating semi-structured and linked data
This work focuses on improving data integration and interlinking systems targeting semi-structured
and Linked Data. It aims at facilitating the exploitation of semi-structured and Linked Data by addressing
the problems of heterogeneity, complexity, scalability and the degree of automation.
Technologies, such as the Resource Description Framework (RDF), enabled new data spaces and
concept descriptors to define an increasing complex and heterogeneous web of data. Many data
providers, however, continue to publish their data using classic models and formats. In addition,
a significant amount of the data released before the existence of the Linked Data movement have
not emigrated and still have a high value. Hence, as a long term solution, an interlinking system
has been designed to contribute to the publishing of semi-structured data as Linked Data. Simultaneously,
to utilise these growing data resource spaces, a data integration middleware has been
proposed as an immediate solution.
The proposed interlinking system verifies in the first place the existence of the Uniform Resource
Identifier (URI) of the resource being published in the cloud in order to establish links with it. It
uses the domain information in defining and matching the datasets. Its main aim is facilitating following
best practice recommendations in publishing data into the Linked Data cloud. The results
of this interlinking approach show that it can target large amounts of data whilst preserving good
precision and recall.
The new approach for integrating semi-structured and Linked Data is a mediator-based architecture.
It enables the integration, on-the-fly, of semi-structured heterogeneous data sources with
large-scale Linked Data sources. Complexity is tackled through a usable and expressive interface.
The evaluation of the proposed architecture shows high performance, precision and adaptability