4 research outputs found

    Linked Data Quality Assessment and its Application to Societal Progress Measurement

    Get PDF
    In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented. With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself. A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to οΏΌοΏΌmeasure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets. Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology. Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis

    Medizinische Semantic Web Anwendungen : AnsΓ€tze fΓΌr Normen und Architekturen zur Schaffung von Vertrauen

    Get PDF
    Ein Problem unserer heutigen Informationsgesellschaft ist, dass Γ„rzte neuen technischen Systemen immer mit großem Argwohn begegnen und dies nicht zu unrecht. Denn immer wieder wird unser Vertrauen in Systeme durch große Missbrauchsskandale erschΓΌttert. Doch sind technische Systeme wie medizinische Semantic Web Anwendungen ein nΓ€chster Schritt zu einer verbesserten medizinischen Versorgung. Deshalb ist das Ziel dieser Arbeit fΓΌr medizinische Semantic Web Anwendungen AnsΓ€tze fΓΌr Normen und Architekturen zur Schaffung von Vertrauen zu finden. Hierzu wird erst das Vertrauen aus unterschiedlichen Blickwinkeln betrachtet und danach medizinische Semantic Web Anwendungen als sozio-technisches System. Dazu wird der der soziale Kontext des deutschen Gesundheitswesens betrachtet. ZusΓ€tzlich wird untersucht wie ein technisches System diesen Kontext verΓ€ndern kΓΆnnte. Aus diesen drei Kategorien werden Normen definiert. Auf diesen Normen aufbauend werden AnsΓ€tze fΓΌr Architekturen formuliert, welche das Vertrauen steigern sollen. Dazu werden schon vorhandene medizinische Ontologien beleuchtet, um den AnsΓ€tzen eine Basis zu geben. Diese AnsΓ€tze fΓΌr Architekturen werden als einzelne Bausteine zu einem grâßeren Ansatz zusammengefΓΌgt. Zuerst wird dieser grâßere Zusammenhang vorgestellt und danach werden einige Bausteine im Folgenden weiter beschrieben. Unter diese Bausteine fallen Kontrollinstanzen und deren Services sowie Zertifizierungsstellen mit unterschiedlichen Arten von Zertifikaten. Die meisten dieser Bausteine sind jedoch Agenten mit den verschiedensten Aufgaben, auf welche genauer eingegangen wird. Die QualitΓ€t der Ontologien sollen diese einerseits als wichtigen Aspekt des Vertrauens verbessern und ΓΌberwachen. Andererseits dienen weitere Agenten wiederum der Kommunikation untereinander oder der ΓΌblichen Akquisition von Informationen. DesWeiteren bauen diese Agenten ein Vertrauensnetzwerk untereinander auf. Das Vertrauen zu anderen Agenten wird dabei mit unterschiedlichen Attributen dargestellt und liegt dezentral bei jedem Agenten oder kann ebenfalls von zentralen Services erfragt werden. Ein Austausch der Informationen unter den Agenten ist ebenso mΓΆglich. Diese Architektur mit einer Vielzahl von Agenten und das daraus resultierende Vertrauensnetzwerk soll schließlich ein grundlegendes Vertrauen schaffen, auf welchem medizinische Semantic Web Anwendungen aufbauen kΓΆnnen

    ΠžΠΊΡ€ΡƒΠΆΠ΅ΡšΠ΅ Π·Π° Π°Π½Π°Π»ΠΈΠ·Ρƒ ΠΈ ΠΎΡ†Π΅Π½Ρƒ ΠΊΠ²Π°Π»ΠΈΡ‚Π΅Ρ‚Π° Π²Π΅Π»ΠΈΠΊΠΈΡ… ΠΈ ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ… ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ°

    Get PDF
    Linking and publishing data in the Linked Open Data format increases the interoperability and discoverability of resources over the Web. To accomplish this, the process comprises several design decisions, based on the Linked Data principles that, on one hand, recommend to use standards for the representation and the access to data on the Web, and on the other hand to set hyperlinks between data from different sources. Despite the efforts of the World Wide Web Consortium (W3C), being the main international standards organization for the World Wide Web, there is no one tailored formula for publishing data as Linked Data. In addition, the quality of the published Linked Open Data (LOD) is a fundamental issue, and it is yet to be thoroughly managed and considered. In this doctoral thesis, the main objective is to design and implement a novel framework for selecting, analyzing, converting, interlinking, and publishing data from diverse sources, simultaneously paying great attention to quality assessment throughout all steps and modules of the framework. The goal is to examine whether and to what extent are the Semantic Web technologies applicable for merging data from different sources and enabling end-users to obtain additional information that was not available in individual datasets, in addition to the integration into the Semantic Web community space. Additionally, the Ph.D. thesis intends to validate the applicability of the process in the specific and demanding use case, i.e. for creating and publishing an Arabic Linked Drug Dataset, based on open drug datasets from selected Arabic countries and to discuss the quality issues observed in the linked data life-cycle. To that end, in this doctoral thesis, a Semantic Data Lake was established in the pharmaceutical domain that allows further integration and developing different business services on top of the integrated data sources. Through data representation in an open machine-readable format, the approach offers an optimum solution for information and data dissemination for building domain-specific applications, and to enrich and gain value from the original dataset. This thesis showcases how the pharmaceutical domain benefits from the evolving research trends for building competitive advantages. However, as it is elaborated in this thesis, a better understanding of the specifics of the Arabic language is required to extend linked data technologies utilization in targeted Arabic organizations.ПовСзивањС ΠΈ ΠΎΠ±Ρ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ΅ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° Ρƒ Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Ρƒ "ПовСзани ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈ ΠΏΠΎΠ΄Π°Ρ†ΠΈ" (Π΅Π½Π³. Linked Open Data) ΠΏΠΎΠ²Π΅Ρ›Π°Π²Π° интСропСрабилност ΠΈ могућности Π·Π° ΠΏΡ€Π΅Ρ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ΅ рСсурса ΠΏΡ€Π΅ΠΊΠΎ Web-Π°. ΠŸΡ€ΠΎΡ†Π΅Ρ јС заснован Π½Π° Linked Data ΠΏΡ€ΠΈΠ½Ρ†ΠΈΠΏΠΈΠΌΠ° (W3C, 2006) који са јСднС странС Π΅Π»Π°Π±ΠΎΡ€ΠΈΡ€Π° стандардС Π·Π° ΠΏΡ€Π΅Π΄ΡΡ‚Π°Π²Ρ™Π°ΡšΠ΅ ΠΈ приступ ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ° Π½Π° WΠ΅Π±Ρƒ (RDF, OWL, SPARQL), Π° са Π΄Ρ€ΡƒΠ³Π΅ странС, ΠΏΡ€ΠΈΠ½Ρ†ΠΈΠΏΠΈ ΡΡƒΠ³Π΅Ρ€ΠΈΡˆΡƒ ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ΅ Ρ…ΠΈΠΏΠ΅Ρ€Π²Π΅Π·Π° ΠΈΠ·ΠΌΠ΅Ρ’Ρƒ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΈΠ· Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈΡ… ΠΈΠ·Π²ΠΎΡ€Π°. Упркос Π½Π°ΠΏΠΎΡ€ΠΈΠΌΠ° W3C ΠΊΠΎΠ½Π·ΠΎΡ€Ρ†ΠΈΡ˜ΡƒΠΌΠ° (W3C јС Π³Π»Π°Π²Π½Π° ΠΌΠ΅Ρ’ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Π° ΠΎΡ€Π³Π°Π½ΠΈΠ·Π°Ρ†ΠΈΡ˜Π° Π·Π° стандардС Π·Π° Web-Ρƒ), Π½Π΅ ΠΏΠΎΡΡ‚ΠΎΡ˜ΠΈ Ρ˜Π΅Π΄ΠΈΠ½ΡΡ‚Π²Π΅Π½Π° Ρ„ΠΎΡ€ΠΌΡƒΠ»Π° Π·Π° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΡ˜Ρƒ процСса ΠΎΠ±Ρ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ΅ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° Ρƒ Linked Data Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Ρƒ. Π£Π·ΠΈΠΌΠ°Ρ˜ΡƒΡ›ΠΈ Ρƒ ΠΎΠ±Π·ΠΈΡ€ Π΄Π° јС ΠΊΠ²Π°Π»ΠΈΡ‚Π΅Ρ‚ ΠΎΠ±Ρ˜Π°Π²Ρ™Π΅Π½ΠΈΡ… ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ… ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈΡ… ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΎΠ΄Π»ΡƒΡ‡ΡƒΡ˜ΡƒΡ›ΠΈ Π·Π° Π±ΡƒΠ΄ΡƒΡ›ΠΈ Ρ€Π°Π·Π²ΠΎΡ˜ Web-Π°, Ρƒ овој Π΄ΠΎΠΊΡ‚ΠΎΡ€ΡΠΊΠΎΡ˜ Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜ΠΈ, Π³Π»Π°Π²Π½ΠΈ Ρ†ΠΈΡ™ јС (1) дизајн ΠΈ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΡ˜Π° ΠΈΠ½ΠΎΠ²Π°Ρ‚ΠΈΠ²Π½ΠΎΠ³ ΠΎΠΊΠ²ΠΈΡ€Π° Π·Π° ΠΈΠ·Π±ΠΎΡ€, Π°Π½Π°Π»ΠΈΠ·Ρƒ, ΠΊΠΎΠ½Π²Π΅Ρ€Π·ΠΈΡ˜Ρƒ, мСђусобно повСзивањС ΠΈ ΠΎΠ±Ρ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ΅ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΈΠ· Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈΡ… ΠΈΠ·Π²ΠΎΡ€Π° ΠΈ (2) Π°Π½Π°Π»ΠΈΠ·Π° ΠΏΡ€ΠΈΠΌΠ΅Π½Π° ΠΎΠ²ΠΎΠ³ приступа Ρƒ Ρ„Π°Ρ€ΠΌΠ°Ρ†eутском Π΄ΠΎΠΌΠ΅Π½Ρƒ. ΠŸΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π° докторска Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜Π° Π΄Π΅Ρ‚Π°Ρ™Π½ΠΎ ΠΈΡΡ‚Ρ€Π°ΠΆΡƒΡ˜Π΅ ΠΏΠΈΡ‚Π°ΡšΠ΅ ΠΊΠ²Π°Π»ΠΈΡ‚Π΅Ρ‚Π° Π²Π΅Π»ΠΈΠΊΠΈΡ… ΠΈ ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ… СкосистСма ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° (Π΅Π½Π³. Linked Data Ecosystems), ΡƒΠ·ΠΈΠΌΠ°Ρ˜ΡƒΡ›ΠΈ Ρƒ ΠΎΠ±Π·ΠΈΡ€ могућност ΠΏΠΎΠ½ΠΎΠ²Π½ΠΎΠ³ ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ° ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈΡ… ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ°. Π Π°Π΄ јС мотивисан ΠΏΠΎΡ‚Ρ€Π΅Π±ΠΎΠΌ Π΄Π° сС ΠΎΠΌΠΎΠ³ΡƒΡ›ΠΈ истраТивачима ΠΈΠ· арапских Π·Π΅ΠΌΠ°Ρ™Π° Π΄Π° ΡƒΠΏΠΎΡ‚Ρ€Π΅Π±ΠΎΠΌ сСмантичких Π²Π΅Π± Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π° ΠΏΠΎΠ²Π΅ΠΆΡƒ својС ΠΏΠΎΠ΄Π°Ρ‚ΠΊΠ΅ са ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈΠΌ ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ°, ΠΊΠ°ΠΎ Π½ΠΏΡ€. DBpedia-јом. Π¦ΠΈΡ™ јС Π΄Π° сС испита Π΄Π° Π»ΠΈ ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈ ΠΏΠΎΠ΄Π°Ρ†ΠΈ ΠΈΠ· Арапских Π·Π΅ΠΌΠ°Ρ™Π° ΠΎΠΌΠΎΠ³ΡƒΡ›Π°Π²Π°Ρ˜Ρƒ ΠΊΡ€Π°Ρ˜ΡšΠΈΠΌ корисницима Π΄Π° Π΄ΠΎΠ±ΠΈΡ˜Ρƒ Π΄ΠΎΠ΄Π°Ρ‚Π½Π΅ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π΅ којС нису доступнС Ρƒ ΠΏΠΎΡ˜Π΅Π΄ΠΈΠ½Π°Ρ‡Π½ΠΈΠΌ скуповима ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ°, ΠΏΠΎΡ€Π΅Π΄ ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΡ˜Π΅ Ρƒ сСмантички WΠ΅Π± простор. Докторска Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜Π° ΠΏΡ€Π΅Π΄Π»Π°ΠΆΠ΅ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ»ΠΎΠ³ΠΈΡ˜Ρƒ Π·Π° Ρ€Π°Π·Π²ΠΎΡ˜ Π°ΠΏΠ»ΠΈΠΊΠ°Ρ†ΠΈΡ˜Π΅ Π·Π° Ρ€Π°Π΄ са ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΠΌ (Linked) ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ° ΠΈ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚ΠΈΡ€Π° софтвСрско Ρ€Π΅ΡˆΠ΅ΡšΠ΅ којС ΠΎΠΌΠΎΠ³ΡƒΡ›ΡƒΡ˜Π΅ ΠΏΡ€Π΅Ρ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ΅ консолидованог скупа ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΎ Π»Π΅ΠΊΠΎΠ²ΠΈΠΌΠ° ΠΈΠ· ΠΈΠ·Π°Π±Ρ€Π°Π½ΠΈΡ… арапских Π·Π΅ΠΌΠ°Ρ™Π°. Консолидовани скуп ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° јС ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚ΠΈΡ€Π°Π½ Ρƒ ΠΎΠ±Π»ΠΈΠΊΡƒ Π‘Π΅ΠΌΠ°Π½Ρ‚ΠΈΡ‡ΠΊΠΎΠ³ Ρ˜Π΅Π·Π΅Ρ€Π° ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° (Π΅Π½Π³. Semantic Data Lake). Ова Ρ‚Π΅Π·Π° ΠΏΠΎΠΊΠ°Π·ΡƒΡ˜Π΅ ΠΊΠ°ΠΊΠΎ фармацСутска ΠΈΠ½Π΄ΡƒΡΡ‚Ρ€ΠΈΡ˜Π° ΠΈΠΌΠ° користи ΠΎΠ΄ ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅ ΠΈΠ½ΠΎΠ²Π°Ρ‚ΠΈΠ²Π½ΠΈΡ… Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π° ΠΈ истраТивачких Ρ‚Ρ€Π΅Π½Π΄ΠΎΠ²Π° ΠΈΠ· области сСмантичких Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π°. ΠœΠ΅Ρ’ΡƒΡ‚ΠΈΠΌ, ΠΊΠ°ΠΊΠΎ јС Π΅Π»Π°Π±ΠΎΡ€ΠΈΡ€Π°Π½ΠΎ Ρƒ овој Ρ‚Π΅Π·ΠΈ, ΠΏΠΎΡ‚Ρ€Π΅Π±Π½ΠΎ јС Π±ΠΎΡ™Π΅ Ρ€Π°Π·ΡƒΠΌΠ΅Π²Π°ΡšΠ΅ спСцифичности арапског јСзика Π·Π° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΡ˜Ρƒ Linked Data Π°Π»Π°Ρ‚Π° ΠΈ ΡšΡƒΡ…ΠΎΠ²Ρƒ ΠΏΡ€ΠΈΠΌΠ΅Π½Ρƒ са ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ° ΠΈΠ· Арапских Π·Π΅ΠΌΠ°Ρ™Π°
    corecore