1,730 research outputs found
Distributed Holistic Clustering on Linked Data
Link discovery is an active field of research to support data integration in
the Web of Data. Due to the huge size and number of available data sources,
efficient and effective link discovery is a very challenging task. Common
pairwise link discovery approaches do not scale to many sources with very large
entity sets. We here propose a distributed holistic approach to link many data
sources based on a clustering of entities that represent the same real-world
object. Our clustering approach provides a compact and fused representation of
entities, and can identify errors in existing links as well as many new links.
We support a distributed execution of the clustering approach to achieve faster
execution times and scalability for large real-world data sets. We provide a
novel gold standard for multi-source clustering, and evaluate our methods with
respect to effectiveness and efficiency for large data sets from the geographic
and music domains
LODE: Linking Digital Humanities Content to the Web of Data
Numerous digital humanities projects maintain their data collections in the
form of text, images, and metadata. While data may be stored in many formats,
from plain text to XML to relational databases, the use of the resource
description framework (RDF) as a standardized representation has gained
considerable traction during the last five years. Almost every digital
humanities meeting has at least one session concerned with the topic of digital
humanities, RDF, and linked data. While most existing work in linked data has
focused on improving algorithms for entity matching, the aim of the
LinkedHumanities project is to build digital humanities tools that work "out of
the box," enabling their use by humanities scholars, computer scientists,
librarians, and information scientists alike. With this paper, we report on the
Linked Open Data Enhancer (LODE) framework developed as part of the
LinkedHumanities project. With LODE we support non-technical users to enrich a
local RDF repository with high-quality data from the Linked Open Data cloud.
LODE links and enhances the local RDF repository without compromising the
quality of the data. In particular, LODE supports the user in the enhancement
and linking process by providing intuitive user-interfaces and by suggesting
high-quality linking candidates using tailored matching algorithms. We hope
that the LODE framework will be useful to digital humanities scholars
complementing other digital humanities tools
Statistical analysis of the owl:sameAs network for aligning concepts in the linking open data cloud
The massively distributed publication of linked data has brought to the attention of scientific community the limitations of classic methods for achieving data integration and the opportunities of pushing the boundaries of the field by experimenting this collective enterprise that is the linking open data cloud. While reusing existing ontologies is the choice of preference, the exploitation of ontology alignments still is a required step for easing the burden of integrating heterogeneous data sets. Alignments, even between the most used vocabularies, is still poorly supported in systems nowadays whereas links between instances are the most widely used means for bridging the gap between different data sets. We provide in this paper an account of our statistical and qualitative analysis of the network of instance level equivalences in the Linking Open Data Cloud (i.e. the sameAs network) in order to automatically compute alignments at the conceptual level. Moreover, we explore the effect of ontological information when adopting classical Jaccard methods to the ontology alignment task. Automating such task will allow in fact to achieve a clearer conceptual description of the data at the cloud level, while improving the level of integration between datasets. <br/
Linked Logainm: enhancing library metadata using linked data of Irish place names
Linked Logainm is the newly created Linked Data version of Logainm.ie, an online database holding the authoritative hierarchical list of Irish and English language place names in Ireland. As a use case to demonstrate the benefit of Linked Data to the library community, the Linked Logainm dataset was used to enhance the Longfield Map collection, a set of digitised 18th–19th century maps held by the National Library of Ireland. This paper describes the process of creating Linked Logainm, including the transformation of the data from XML to RDF, the generation of links to external geographic datasets like DBpedia and the Faceted Application of Subject Terminology, and the enhancement of the Library’s metadata records
Web
The Web of Linked Data grows rapidly and already contains data originating from hundreds of data sources. The quality of data from those sources is very diverse, as values may be out of date, incomplete or incorrect. Moreover, data sources may provide conflicting values for a single real-world object. In order for Linked Data applications to consume data from this global data space in an integrated fashion, a number of challenges have to be overcome. One of these challenges is to rate and to integrate data based on their quality. However, quality is a very subjective matter, and finding a canonic judgement that is suitable for each and every task is not feasible. To simplify the task of consuming high-quality data, we present Sieve, a framework for flexibly expressing quality assessment methods as well as fusion methods. Sieve is integrated into the Linked Data Integration Framework (LDIF), which handles Data Access, Schema Mapping and Identity Resolution, all crucial preliminaries for quality assessment and fusion. We demonstrate Sieve in a data integration scenario importing data from the English and Portuguese versions of DBpedia, and discuss how we increase completeness, conciseness and consistency through the use of our framework
Linked Data based Health Information Representation, Visualization and Retrieval System on the Semantic Web
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.To better facilitate health information dissemination, using flexible ways to
represent, query and visualize health data becomes increasingly important.
Semantic Web technologies, which provide a common framework by
allowing data to be shared and reused between applications, can be applied
to the management of health data. Linked open data - a new semantic web
standard to publish and link heterogonous data- allows not only human,
but also machine to brows data in unlimited way.
Through a use case of world health organization HIV data of sub Saharan
Africa - which is severely affected by HIV epidemic, this thesis built a
linked data based health information representation, querying and
visualization system. All the data was represented with RDF, by
interlinking it with other related datasets, which are already on the cloud.
Over all, the system have more than 21,000 triples with a SPARQL
endpoint; where users can download and use the data and – a SPARQL
query interface where users can put different type of query and retrieve the
result. Additionally, It has also a visualization interface where users can
visualize the SPARQL result with a tool of their preference. For users who
are not familiar with SPARQL queries, they can use the linked data search
engine interface to search and browse the data.
From this system we can depict that current linked open data technologies
have a big potential to represent heterogonous health data in a flexible and
reusable manner and they can serve in intelligent queries, which can
support decision-making. However, in order to get the best from these
technologies, improvements are needed both at the level of triple stores
performance and domain-specific ontological vocabularies
Multidimensional integration of RDF datasets
Data providers have been uploading RDF datasets on the web to aid researchers and analysts in finding insights. These datasets, made available by different data providers, contain common characteristics that enable their integration. However, since each provider has their own data dictionary, identifying common concepts is not trivial and we require costly and complex entity resolution and transformation rules to perform such integration. In this paper, we propose a novel method, that given a set of independent RDF datasets, provides a multidimensional interpretation of these datasets and integrates them based on a common multidimensional space (if any) identified. To do so, our method first identifies potential dimensional and factual data on the input datasets and performs entity resolution to merge common dimensional and factual concepts. As a result, we generate a common multidimensional space and identify each input dataset as a cuboid of the resulting lattice. With such output, we are able to exploit open data with OLAP operators in a richer fashion than dealing with them separately.This research has been funded by the European Commission through the Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence-Doctoral College (IT4BI-DC) program.Peer ReviewedPostprint (author's final draft
Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project
Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic
Linked data authority records for Irish place names
Linked Data technologies are increasingly being implemented to enhance cataloguing workflows in libraries, archives and museums. We review current best practice in library cataloguing, how Linked Data is used to link collections and provide consistency in indexing, and briefly describe the relationship between Linked Data, library data models and descriptive standards. As an example we look at the Logainm.ie dataset, an online database holding the authoritative hierarchical list of Irish and English language place names in Ireland. This paper describes the process of creating the new Linked Logainm dataset, including the transformation of the data from XML to RDF and the generation of links to external geographic datasets like DBpedia and the Faceted Application of Subject Terminology. This dataset was then used to enhance the National Library of Ireland's metadata MARCXML metadata records for its Longfield maps collection. We also describe the potential benefits of Linked Data for libraries, focusing on the use of the Linked Logainm dataset and its future potential for Irish heritage institutions
- …