368,900 research outputs found

    Htab2RDF: Mapping HTML Tables to RDF Triples

    Get PDF
    The Web has become a tremendously huge data source hidden under linked documents. A significant number of Web documents include HTML tables generated dynamically from relational databases. Often, there is no direct public access to the databases themselves. On the other hand, RDF (Resource Description Framework) gives an efficient mechanism to represent directly data on the Web based on a Web-scalable architecture for identification and interpretation of terms. This leads to the concept of Linked Data on the Web. To allow direct access to data on the Web as Linked Data, we propose in this paper an approach to transform HTML tables into RDF triples. It consists of three main phases: refining, pre-treatment and mapping. The whole process is assisted by a domain ontology and the WordNet lexical database. A tool called Htab2RDF has been implemented. Experiments have been carried out to evaluate and show efficiency of the proposed approach

    Linked Data in Libraries: A Case Study of Harvesting and Sharing Bibliographic Metadata with BIBFRAME

    Get PDF
    By way of a case study this paper illustrates and evaluates the Bibliographic Framework (or BIBFRAME) as means for harvesting and sharing bibliographic metadata over the Web for libraries. BIBFRAME is an emerging framework developed by the Library of Congress for bibliographic description based on Linked Data. Much like Semantic Web, the goal of Linked Data is to make Web ā€œdata awareā€ and transform the existing Web of documents into a Web of data. Linked Data leverages the existing Web infrastructure and allows linking and sharing of structured data for human and machine consumption. The BIBFRAME model attempts to contextualize the Linked Data technology for libraries. Library applications and systems contain high-quality structured metadata but this data is generally static in its presentation and seldom integrated with other internal metadata sources or linked to external Web resources. With BIBFRAME existing disparate library metadata sources such as catalogs and digital collections can be harvested and integrated over the Web. In addition, bibliographic data enriched with Linked Data could offer richer navigational control and access points for users. With Linked Data principles, metadata from libraries could also become harvestable by search engines, transforming dormant catalogs and digital collections into active knowledge repositories. Thus experimenting with Linked Data using existing bibliographic metadata holds the potential to empower libraries to harness the reach of commercial search engines to continuously discover, navigate, and obtain new domain specific knowledge resources on the basis of their verified metadata. The initial part of the paper introduces BIBFRAME and discusses Linked Data in the context of libraries. The final part of this paper outlines a step-by-step process for implementing BIBFRAME with existing library metadata

    Visualisation of Linked Data ā€“ Reprise

    Get PDF
    Linked Data promises to serve as a disruptor of traditional approaches to data management and use, promoting the push from the traditional Web of documents to a Web of data. The ability for data consumers to adopt a follow your nose approach, traversing links defined within a dataset or across independently-curated datasets, is an essential feature of this new Web of Data, enabling richer knowledge retrieval thanks to synthesis across multiple sources of, and views on, inter-related datasets. But for the Web of Data to be successful, we must design novel ways of interacting with the corresponding very large amounts of complex, interlinked, multi-dimensional data throughout its management cycle. The design of user interfaces for Linked Data, and more specifically interfaces that represent the data visually, play a central role in this respect. Contributions to this special issue on Linked Data visualisation investigate different approaches to harnessing visualisation as a tool for exploratory discovery and basic-to-advanced analysis. The papers in this volume illustrate the design and construction of intuitive means for end-users to obtain new insight and gather more knowledge, as they follow links defined across datasets over the Web of Data

    Luzzu - A Framework for Linked Data Quality Assessment

    Full text link
    With the increasing adoption and growth of the Linked Open Data cloud [9], with RDFa, Microformats and other ways of embedding data into ordinary Web pages, and with initiatives such as schema.org, the Web is currently being complemented with a Web of Data. Thus, the Web of Data shares many characteristics with the original Web of Documents, which also varies in quality. This heterogeneity makes it challenging to determine the quality of the data published on the Web and to subsequently make this information explicit to data consumers. The main contribution of this article is LUZZU, a quality assessment framework for Linked Open Data. Apart from providing quality metadata and quality problem reports that can be used for data cleaning, LUZZU is extensible: third party metrics can be easily plugged-in the framework. The framework does not rely on SPARQL endpoints, and is thus free of all the problems that come with them, such as query timeouts. Another advantage over SPARQL based qual- ity assessment frameworks is that metrics implemented in LUZZU can have more complex functionality than triple matching. Using the framework, we performed a quality assessment of a number of statistical linked datasets that are available on the LOD cloud. For this evaluation, 25 metrics from ten different dimensions were implemented

    Exploring manuscripts: sharing ancient wisdoms across the semantic web

    Get PDF
    Recent work in digital humanities has seen researchers in-creasingly producing online editions of texts and manuscripts, particularly in adoption of the TEI XML format for online publishing. The benefits of semantic web techniques are un-derexplored in such research, however, with a lack of sharing and communication of research information. The Sharing Ancient Wisdoms (SAWS) project applies linked data prac-tices to enhance and expand on what is possible with these digital text editions. Focussing on Greek and Arabic col-lections of ancient wise sayings, which are often related to each other, we use RDF to annotate and extract seman-tic information from the TEI documents as RDF triples. This allows researchers to explore the conceptual networks that arise from these interconnected sayings. The SAWS project advocates a semantic-web-based methodology, en-hancing rather than replacing current workflow processes, for digital humanities researchers to share their findings and collectively benefit from each otherā€™s work

    Enhancing Usability in Linked Data Editing in Web Applications

    Get PDF
    Editing Linked Data documents represents an enormous challenge to users with limited technical expertise. These users struggle with language rules, relationships between entities, and interconnected concepts. These issues can result in frustration and low data quality. In order to respond to this challenge, we introduce a new editor, designed to facilitate effortless editing of JSON-LD documents, catering to both newcomers and advanced users. It is made for easy and seamless integration into other web-based applications and can be used similar to an HTML tag. The complexity of Linked Data arises from its graph-like structure, where entities are connected through relationships, forming a complex web of semantic connections. While this is advantageous for data integration and cross-platform compatibility, this effort presents significant barriers for those not well-versed in technical aspects. Even with the rise of user-friendly interfaces, manually modifying JSON-LD documents can lead to mistakes in structure and unintended disruptions to valuable linkages. Our proposed solution is a reusable web component based on modern browser technologies. It offers a view on the data which is easier to perceive than typical graph visualizations. This view shows the data as a list of named entities and their properties to simplify the visual complexity, without giving up on the conceptual graph structure. The list view brings the conceptual entities to the front, but still supports more technical structure elements like blank nodes, as they still exist as properties. Using schema.orgā€™s machine-readable definitions, the editor understands how entities may or may not be connected. This is used to offer autocomplete functionality and avoid the invalid use of the schema.org vocabulary. This functionality can be extended using the integrated schema loader concept. From a technical point of view, the web component is an HTML Element which takes a (possibly empty) JSON-LD document. It then provides the modified document as a callback as soon as the user saves the document from within the editor. It is therefore easily integrable into existing projects based on arbitrary web frameworks and does not require any special interface implementations. The component is based on StencilJS, which allows generating wrappers for popular frameworks, for tighter integration. In conclusion, our web component empowers both new and experienced users to edit Linked Data seamlessly, overcoming the inherent challenges associated with manual JSON-LD modification. By simplifying the view on the graph structure and providing an intuitive and supporting interface, the component enhances the ease of use and accessibility of Linked Data editing. This holds significant potential for expediting data curation, collaboration, and integration, thus fostering a more inclusive and dynamic Linked Data ecosystem. This research has been supported by the Helmholtz Metadata Collaboration (HMC) Platform, the German National Research Data Infrastructure (NFDI) and the German Research Foundation (DFG)

    Lifting user generated comments to SIOC

    No full text
    International audienceHTML boilerplate code is acting on webpages as presentation directives for a browser to display data to a human end user. For the machine, our community made tremenduous e orts to provide querying endpoints using consensual schemas, protocols, and principles since the avent of the Linked Data paradigm. These data lifting e orts have been the primary materials for bootstraping the Web of data. Data lifting usually involves an original data structure from which the semantic architect has to produce a mapper to RDF vocabularies. Less e orts are made in order to lift data produced by a Web mining process, due to the di culty to provide an e cient and scalable solution. Nonetheless, the Web of documents is mainly composed of natural language twisted in HTML boilerplate code, and few data schemas can be mapped into RDF. In this paper, we present CommentsLifter, a system that is able to lift SIOC data from user-generated comments in the Web 2.0

    In the age of the web of data: first open data, then big data

    Get PDF
    Review of the concepts and technologies associated with the transition from a web of documents to a web of data. The role that public and academic libraries are playing or may play about big data, open data, and linked open data is described. The strategic importance of open data and linked open data (LOD) for the future of libraries is emphasized

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    From many records to one graph: Heterogeneity conflicts in the Linked data restructuring cycle

    Get PDF
    Introduction. During the last couple of years the library community has developed a number of comprehensive metadata standardization projects inspired by the idea of linked data, such as the BIBFRAME model. Linked data is a set of best practice principles of publishing and exposing data on the Web utilizing a graph based data model powered with semantics and cross-domain relationships. In the light of traditional metadata practices of libraries the best practices of linked data imply a restructuring process from a collection of semi-structured bibliographic records to a semantic graph of unambiguously defined entities. A successful interlinking of entities in this graph to entities in external data sets requires a minimum level of semantic interoperability. Method The examination is carried out through a review of the relevant research within the field and of the essential documents that describe the key concepts. Analysis A high level examination of the concepts of the semantic Web and linked data is provided with a particular focus on the challenges they entail for libraries and their meta-data practices in the perspective of the extensive restructuring process that has already started. Conclusion We demonstrate that a set of heterogeneity conflicts, threatening the level of semantic interoperability, can be associated with various phases of this restructuring process from analysis and modelling to conversion and external interlinking. It also claims that these conflicts and their potential solutions are mutually dependent across the phases
    • ā€¦
    corecore