22 research outputs found

    Vom Datenkatalog zum Wissensgraph — Forschungsdaten im konzeptuellen Modell von FRBR

    Get PDF
    Die Beschreibung von Forschungsdaten im Bereich des Forschungsdatenmanagements ist oft ungenau, unvollständig oder nicht konsistent bzw. leidet unter einer nicht konsequent durchgeführten Metadatenkuration. Der Beitrag stellt einen bibliotheks- und informationswissenschaftlich motivierten Ansatz vor, wie Metadaten über Forschungsdaten mithilfe des konzeptuellen Modells von FRBR (Functional Requirements for Bibliographic Records) verbessert werden können. Das konkrete Ziel dabei ist die Konstruktion eines Wissensgraphen, der FRBRi- sierte Metadaten aus einem Datenkatalog mit Metadaten aus einem Bibliothekskatalog sowie mit Forschungsinformation integriert. Die Methode baut auf einem Datenkatalog mit einem auf DCAT (Data Catalog Vocabulary) und Disco ( DDI-RDF Discovery Vocabulary) basierenden Anwendungsprofil als Metadatenschema auf. Die Metadaten im Datenkatalog werden mit SHACL (Shapes Constraint Language) validiert und dienen somit als Grundlage für die FRBRisierung zum Aufbau des Wissensgraphen mit FaBiO (FRBR-aligned Bibliographic Ontology) als FRBR -basiertem Datenmodell. Die FRBRisierten und integrierten Metadaten im Wissensgraphen unterstützen schließlich aufgrund der besseren Metadatenqualität und der Vorgehensweise zur Verlinkung von Entitäten aus Datenkatalog, Bibliothekskatalog und Forschungsinformationssystem insbesondere die Versionierung und Provenienzinformation von Forschungsdaten und nicht zuletzt auch die Datenzitation. Der FRBRisierungsansatz trägt dadurch zur Verbesserung des Information Retrieval in Discovery-Systemen für Forschungsdaten bei

    ONE APPROACH FOR ELIMINATING COMMUTATIVE PAIRS OF ELEMENTS IN WEB APPLICATIONS

    Get PDF
    Nowadays web applications contain graphical elements which are often connected in a way that reflects relationship between them. Connections have different structure depending on needs of application. A common type of structure used to describe relationship between elements consists of four members, two of which are elements and two lines represent relationship between them. This paper presents an approach to eliminate commutative pairs in array of elements with mutually exclusive connections. The ECOMPAIR (Elimination of COMmutative PAIRs) algorithm allows using one line with associated descriptions to link two elements. Experimental results obtained from applying the ECOMPAIR algorithm on jsPlumb library for visualization show significant reduction on number of elements necessary for visualization

    Mining Authoritativeness in Art Historical Photo Archives. Semantic Web Applications for Connoisseurship

    Get PDF
    The purpose of this work is threefold: (i) to facilitate knowledge discovery in art historical photo archives, (ii) to support users' decision-making process when evaluating contradictory artwork attributions, and (iii) to provide policies for information quality improvement in art historical photo archives. The approach is to leverage Semantic Web technologies in order to aggregate, assess, and recommend the most documented authorship attributions. In particular, findings of this work offer art historians an aid for retrieving relevant sources, assessing textual authoritativeness (i.e. internal grounds) of sources of attribution, and evaluating cognitive authoritativeness of cited scholars. At the same time, the retrieval process allows art historical data providers to define a low-cost data integration process to update and enrich their collection data. The contributions of this thesis are the following: (1) a methodology for representing questionable information by means of ontologies; (2) a conceptual framework of Information Quality measures addressing dimensions of textual and cognitive authoritativeness characterising art historical data, (3) a number of policies for metadata quality improvement in art historical photo archives as derived from the application of the framework, (4) a ranking model leveraging the conceptual framework, (5) a semantic crawler, called mAuth, that harvests authorship attributions in the Web of Data, and (6) an API and a Web Application to serve information to applications and final users for consuming data. Despite findings are limited to a restricted number of photo archives and datasets, the research impacts on a broader number of stakeholders, such as archives, museums, and libraries, which can reuse the conceptual framework for assessing questionable information, mutatis mutandi, to other near fields in the Humanities

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Linked Data Quality Assessment and its Application to Societal Progress Measurement

    Get PDF
    In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented. With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself. A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets. Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology. Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic

    2019 EC3 July 10-12, 2019 Chania, Crete, Greece

    Get PDF

    Linked Research on the Decentralised Web

    Get PDF
    This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving. The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems. From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication. Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments. Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers. The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data. Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information. Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud). Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication
    corecore