96 research outputs found

    Materializing the editing history of Wikipedia as linked data in DBpedia

    Get PDF
    International audienceWe describe a DBpedia extractor materializing the editing history of Wikipedia pages as linked data to support queries and indicators on the history

    A Framework for the Analysis and User-Driven Evaluation of Trust on the Semantic Web

    Get PDF
    This project will examine the area of trust on the Semantic Web and develop a framework for publishing and verifying trusted Linked Data. Linked Data describes a method of publishing structured data, automatically readable by computers, which can linked to other heterogeneous data with the purpose of becoming more useful. Trust plays a significant role in the adoption of new technologies and even more so in a sphere with such vast amounts of publicly-created data. Trust is paramount to the effective sharing and communication of tacit knowledge (Hislop, 2013). Up to now, the area of trust in Linked Data has not been adequately addressed, despite the Semantic Web stack having included a trust layer from the very beginning (Artz and Gil, 2007). Some of the most accurate data on the Semantic Web lies practically unused, while some of the most used linked data has high numbers of errors (Zaveri et al., 2013). Many of the datasets and links that exist on the Semantic Web are out of date and/or invalid and this undermines the credibility and validity, and ultimately, the trustworthiness of both the dataset and the data provider (Rajabi et al., 2012). This research will examine a number of datasets to determine the quality metrics that a dataset is required to meet to be considered ‘trusted’. The key findings will be assessed and utilized in the creation of a learning tool and a framework for creating trusted Linked Data

    Materializing the editing history of Wikipedia as linked data in DBpedia

    Get PDF
    International audienceWe describe a DBpedia extractor materializing the editing history of Wikipedia pages as linked data to support queries and indicators on the history

    Enabling automatic provenance-based trust assessment of web content

    Get PDF

    The building and application of a semantic platform for an e-research society

    No full text
    This thesis reviews the area of e-Research (the use of electronic infrastructure to support research) and considers how the insight gained from the development of social networking sites in the early 21st century might assist researchers in using this infrastructure. In particular it examines the myExperiment project, a website for e-Research that allows users to upload, share and annotate work flows and associated files, using a social networking framework. This Virtual Organisation (VO) supports many of the attributes required to allow a community of users to come together to build an e-Research society. The main focus of the thesis is how the emerging society that is developing out of my-Experiment could use Semantic Web technologies to provide users with a significantly richer representation of their research and research processes to better support reproducible research. One of the initial major contributions was building an ontology for myExperiment. Through this it became possible to build an API for generating and delivering this richer representation and an interface for querying it. Having this richer representation it has been possible to follow Linked Data principles to link up with other projects that have this type of representation. Doing this has allowed additional data to be provided to the user and has begun to set in context the data produced by myExperiment. The way that the myExperiment project has gone about this task and consideration of how changes may affect existing users, is another major contribution of this thesis. Adding a semantic representation to an emergent e-Research society like myExperiment,has given it the potential to provide additional applications. In particular the capability to support Research Objects, an encapsulation of a scientist's research or research process to support reproducibility. The insight gained by adding a semantic representation to myExperiment, has allowed this thesis to contribute towards the design of the architecture for these Research Objects that use similar Semantic Web technologies. The myExperiment ontology has been designed such that it can be aligned with other ontologies. Scientific Discourse, the collaborative argumentation of different claims and hypotheses, with the support of evidence from experiments, to construct, confirm or disprove theories requires the capability to represent experiments carried out in silico. This thesis discusses how, as part of the HCLS Scientific Discourse subtask group, the myExperiment ontology has begun to be aligned with other scientific discourse ontologies to provide this capability. It also compares this alignment of ontologies with the architecture for Research Objects. This thesis has also examines how myExperiment's Linked Data and that of other projects can be used in the design of novel interfaces. As a theoretical exercise, it considers how this Linked Data might be used to support a Question-Answering system, that would allow users to query myExperiment's data in a more efficient and user-friendly way. It concludes by reviewing all the steps undertaken to provide a semantic platform for an emergent e-Research society to facilitate the sharing of research and its processes to support reproducible research. It assesses their contribution to enhancing the features provided by myExperiment, as well as e-Research as a whole. It considers how the contributions provided by this thesis could be extended to produce additional tools that will allow researchers to make greater use of the rich data that is now available, in a way that enhances their research process rather than significantly changing it or adding extra workload

    Окружење за анализу и оцену квалитета великих и повезаних података

    Get PDF
    Linking and publishing data in the Linked Open Data format increases the interoperability and discoverability of resources over the Web. To accomplish this, the process comprises several design decisions, based on the Linked Data principles that, on one hand, recommend to use standards for the representation and the access to data on the Web, and on the other hand to set hyperlinks between data from different sources. Despite the efforts of the World Wide Web Consortium (W3C), being the main international standards organization for the World Wide Web, there is no one tailored formula for publishing data as Linked Data. In addition, the quality of the published Linked Open Data (LOD) is a fundamental issue, and it is yet to be thoroughly managed and considered. In this doctoral thesis, the main objective is to design and implement a novel framework for selecting, analyzing, converting, interlinking, and publishing data from diverse sources, simultaneously paying great attention to quality assessment throughout all steps and modules of the framework. The goal is to examine whether and to what extent are the Semantic Web technologies applicable for merging data from different sources and enabling end-users to obtain additional information that was not available in individual datasets, in addition to the integration into the Semantic Web community space. Additionally, the Ph.D. thesis intends to validate the applicability of the process in the specific and demanding use case, i.e. for creating and publishing an Arabic Linked Drug Dataset, based on open drug datasets from selected Arabic countries and to discuss the quality issues observed in the linked data life-cycle. To that end, in this doctoral thesis, a Semantic Data Lake was established in the pharmaceutical domain that allows further integration and developing different business services on top of the integrated data sources. Through data representation in an open machine-readable format, the approach offers an optimum solution for information and data dissemination for building domain-specific applications, and to enrich and gain value from the original dataset. This thesis showcases how the pharmaceutical domain benefits from the evolving research trends for building competitive advantages. However, as it is elaborated in this thesis, a better understanding of the specifics of the Arabic language is required to extend linked data technologies utilization in targeted Arabic organizations.Повезивање и објављивање података у формату "Повезани отворени подаци" (енг. Linked Open Data) повећава интероперабилност и могућности за претраживање ресурса преко Web-а. Процес је заснован на Linked Data принципима (W3C, 2006) који са једне стране елаборира стандарде за представљање и приступ подацима на Wебу (RDF, OWL, SPARQL), а са друге стране, принципи сугеришу коришћење хипервеза између података из различитих извора. Упркос напорима W3C конзорцијума (W3C је главна међународна организација за стандарде за Web-у), не постоји јединствена формула за имплементацију процеса објављивање података у Linked Data формату. Узимајући у обзир да је квалитет објављених повезаних отворених података одлучујући за будући развој Web-а, у овој докторској дисертацији, главни циљ је (1) дизајн и имплементација иновативног оквира за избор, анализу, конверзију, међусобно повезивање и објављивање података из различитих извора и (2) анализа примена овог приступа у фармацeутском домену. Предложена докторска дисертација детаљно истражује питање квалитета великих и повезаних екосистема података (енг. Linked Data Ecosystems), узимајући у обзир могућност поновног коришћења отворених података. Рад је мотивисан потребом да се омогући истраживачима из арапских земаља да употребом семантичких веб технологија повежу своје податке са отвореним подацима, као нпр. DBpedia-јом. Циљ је да се испита да ли отворени подаци из Арапских земаља омогућавају крајњим корисницима да добију додатне информације које нису доступне у појединачним скуповима података, поред интеграције у семантички Wеб простор. Докторска дисертација предлаже методологију за развој апликације за рад са повезаним (Linked) подацима и имплементира софтверско решење које омогућује претраживање консолидованог скупа података о лековима из изабраних арапских земаља. Консолидовани скуп података је имплементиран у облику Семантичког језера података (енг. Semantic Data Lake). Ова теза показује како фармацеутска индустрија има користи од примене иновативних технологија и истраживачких трендова из области семантичких технологија. Међутим, како је елаборирано у овој тези, потребно је боље разумевање специфичности арапског језика за имплементацију Linked Data алата и њухову примену са подацима из Арапских земаља

    The euBusinessGraph ontology: A lightweight ontology for harmonizing basic company information

    Get PDF
    Company data, ranging from basic company information such as company name(s) and incorporation date to complex balance sheets and personal data about directors and shareholders, are the foundation that many data value chains depend upon in various sectors (e.g., business information, marketing and sales, etc.). Company data becomes a valuable asset when data is collected and integrated from a variety of sources, both authoritative (e.g., national business registers) and non-authoritative (e.g., company websites). Company data integration is however a difficult task primarily due to the heterogeneity and complexity of company data, and the lack of generally agreed upon semantic descriptions of the concepts in this domain. In this article, we introduce the euBusinessGraph ontology as a lightweight mechanism for harmonising company data for the purpose of aggregating, linking, provisioning and analysing basic company data. The article provides an overview of the related work, ontology scope, ontology development process, explanations of core concepts and relationships, and the implementation of the ontology. Furthermore, we present scenarios where the ontology was used, among others, for publishing company data (business knowledge graph) and for comparing data from various company data providers. The euBusinessGraph ontology serves as an asset not only for enabling various tasks related to company data but also on which various extensions can be built upon.publishedVersio

    A finder and representation system for knowledge carriers based on granular computing

    Get PDF
    In one of his publications Aristotle states ”All human beings by their nature desire to know” [Kraut 1991]. This desire is initiated the day we are born and accompanies us for the rest of our life. While at a young age our parents serve as one of the principle sources for knowledge, this changes over the course of time. Technological advances and particularly the introduction of the Internet, have given us new possibilities to share and access knowledge from almost anywhere at any given time. Being able to access and share large collections of written down knowledge is only one part of the equation. Just as important is the internalization of it, which in many cases can prove to be difficult to accomplish. Hence, being able to request assistance from someone who holds the necessary knowledge is of great importance, as it can positively stimulate the internalization procedure. However, digitalization does not only provide a larger pool of knowledge sources to choose from but also more people that can be potentially activated, in a bid to receive personalized assistance with a given problem statement or question. While this is beneficial, it imposes the issue that it is hard to keep track of who knows what. For this task so-called Expert Finder Systems have been introduced, which are designed to identify and suggest the most suited candidates to provide assistance. Throughout this Ph.D. thesis a novel type of Expert Finder System will be introduced that is capable of capturing the knowledge users within a community hold, from explicit and implicit data sources. This is accomplished with the use of granular computing, natural language processing and a set of metrics that have been introduced to measure and compare the suitability of candidates. Furthermore, are the knowledge requirements of a problem statement or question being assessed, in order to ensure that only the most suited candidates are being recommended to provide assistance

    Assessing the quality of Wikidata referencing

    Get PDF
    Wikidata is a versatile and broad-based Knowledge Graph (KG) that leverages the power of collaborative contributions via an open wiki, augmented by bot accounts, to curate the content. Wikidata represents over 102 million interlinked data entities, accompanied by over 1.4 billion statements about the items, accessible to the public via a SPARQL endpoint and diverse dump formats. The Wikidata data model enables assigning references to every single statement. While the quality of Wikidata statements has been assessed, the quality of references in this knowledge graph is not well covered in the literature. To cover the gap, we develop and implement a comprehensive referencing quality assessment framework based on Linked Data quality dimensions and criteria. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. RQSS provides quantified scores by which the referencing quality can be analyzed and compared. Due to the scale of Wikidata, we developed a subsetting approach to creating a comparison platform that systematically samples Wikidata. We have used both well-defined subsets and random samples to evaluate the quality of references in Wikidata using RQSS. Based on RQSS, the overall referencing quality in Wikidata subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores amongst topical subsets. Regarding referencing quality dimensions, all subsets have high scores in accuracy, availability, security, and understandability, but have weaker scores in completeness, verifiability, objectivity, and versatility. RQSS scripts can be reused to monitor the referencing quality over time. The evaluation shows that RQSS is practical and provides valuable information, which can be used by Wikidata contributors and WikiProject owners to identify the referencing quality gaps. Although RQSS is developed based on the Wikidata RDF model, its referencing quality assessment framework can be generalized to any RDF KG.James Watt Scholarship fundin

    Knowledge-Driven Harmonization of Sensor Observations: Exploiting Linked Open Data for IoT Data Streams

    Get PDF
    The rise of the Internet of Things leads to an unprecedented number of continuous sensor observations that are available as IoT data streams. Harmonization of such observations is a labor-intensive task due to heterogeneity in format, syntax, and semantics. We aim to reduce the effort for such harmonization tasks by employing a knowledge-driven approach. To this end, we pursue the idea of exploiting the large body of formalized public knowledge represented as statements in Linked Open Data
    corecore