36,940 research outputs found
Collaboratively Patching Linked Data
Today's Web of Data is noisy. Linked Data often needs extensive preprocessing
to enable efficient use of heterogeneous resources. While consistent and valid
data provides the key to efficient data processing and aggregation we are
facing two main challenges: (1st) Identification of erroneous facts and
tracking their origins in dynamically connected datasets is a difficult task,
and (2nd) efforts in the curation of deficient facts in Linked Data are
exchanged rather rarely. Since erroneous data often is duplicated and
(re-)distributed by mashup applications it is not only the responsibility of a
few original publishers to keep their data tidy, but progresses to be a mission
for all distributers and consumers of Linked Data too. We present a new
approach to expose and to reuse patches on erroneous data to enhance and to add
quality information to the Web of Data. The feasibility of our approach is
demonstrated by example of a collaborative game that patches statements in
DBpedia data and provides notifications for relevant changes.Comment: 2nd International Workshop on Usage Analysis and the Web of Data
(USEWOD2012) in the 21st International World Wide Web Conference (WWW2012),
Lyon, France, April 17th, 201
Provenance in Linked Data Integration
The open world of the (Semantic) Web is a global information space offering diverse materials of disparate qualities, and the opportunity to re-use, aggregate, and integrate these materials in novel ways. The advent of Linked Data brings the potential to expose data on the Web, creating new challenges for data consumers who want to integrate these data. One challenge is the ability, for users, to elicit the reliability and/or the accuracy of the data they come across. In this paper, we describe a light-weight provenance extension for the voiD vocabulary that allows data publishers to add provenance metadata to their datasets. These provenance metadata can be queried by consumers and used as contextual information for integration and inter-operation of information resources on the Semantic Web
Linked Data - the story so far
The term âLinked Dataâ refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertionsâ the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward
Predicate Abstraction for Linked Data Structures
We present Alias Refinement Types (ART), a new approach to the verification
of correctness properties of linked data structures. While there are many
techniques for checking that a heap-manipulating program adheres to its
specification, they often require that the programmer annotate the behavior of
each procedure, for example, in the form of loop invariants and pre- and
post-conditions. Predicate abstraction would be an attractive abstract domain
for performing invariant inference, existing techniques are not able to reason
about the heap with enough precision to verify functional properties of data
structure manipulating programs. In this paper, we propose a technique that
lifts predicate abstraction to the heap by factoring the analysis of data
structures into two orthogonal components: (1) Alias Types, which reason about
the physical shape of heap structures, and (2) Refinement Types, which use
simple predicates from an SMT decidable theory to capture the logical or
semantic properties of the structures. We prove ART sound by translating types
into separation logic assertions, thus translating typing derivations in ART
into separation logic proofs. We evaluate ART by implementing a tool that
performs type inference for an imperative language, and empirically show, using
a suite of data-structure benchmarks, that ART requires only 21% of the
annotations needed by other state-of-the-art verification techniques
FishMark: A Linked Data Application Benchmark
Abstract. FishBase is an important species data collection produced by the FishBase Information and Research Group Inc (FIN), a not-forprofit NGO with the aim of collecting comprehensive information (from the taxonomic to the ecological) about all the worldâs finned fish species. FishBase is exposed as a MySQL backed website (supporting a range of canned, although complex queries) and serves over 33 million hits per month. FishDelish is a transformation of FishBase into LinkedData weighing in at 1.38 billion triples. We have ported a substantial number of FishBase SQL queries to FishDelish SPARQL query which form the basis of a new linked data application benchmark (using our derivative of the Berlin SPARQL Benchmark harness). We use this benchmarking framework to compare the performance of the native MySQL application, the Virtuoso RDF triple store, and the Quest OBDA system on a fishbase.org like application.
Distributed Holistic Clustering on Linked Data
Link discovery is an active field of research to support data integration in
the Web of Data. Due to the huge size and number of available data sources,
efficient and effective link discovery is a very challenging task. Common
pairwise link discovery approaches do not scale to many sources with very large
entity sets. We here propose a distributed holistic approach to link many data
sources based on a clustering of entities that represent the same real-world
object. Our clustering approach provides a compact and fused representation of
entities, and can identify errors in existing links as well as many new links.
We support a distributed execution of the clustering approach to achieve faster
execution times and scalability for large real-world data sets. We provide a
novel gold standard for multi-source clustering, and evaluate our methods with
respect to effectiveness and efficiency for large data sets from the geographic
and music domains
An intelligent linked data quality dashboard
This paper describes a new intelligent, data-driven dashboard for linked data quality assessment. The development goal was to assist data quality engineers to interpret data quality problems found when evaluating a dataset us-ing a metrics-based data quality assessment. This required construction of a graph linking the problematic things identified in the data, the assessment metrics and the source data. This context and supporting user interfaces help the user to un-derstand data quality problems. An analysis widget also helped the user identify the root cause multiple problems. This supported the user in identification and prioritization of the problems that need to be fixed and to improve data quality. The dashboard was shown to be useful for users to clean data. A user evaluation was performed with both expert and novice data quality engineers
- âŠ