227 research outputs found

    Semantic Data Management in Data Lakes

    Full text link
    In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontologybased data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies

    A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data

    Get PDF
    The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe

    Digitale Edition in Österreich

    Get PDF
    Between 2016 and 2020 the federally funded project "KONDE - Kompetenznetzwerk Digitale Edition" created a network of collaboration between Austrian institutions and researchers working on digital scholarly editions. With the present volume the editors provide a space where researchers and editors from Austrian institutions could theorize on their work and present their editing projects. The collection creates a snapshot of the interests and main research areas regarding digital scholarly editing in Austria at the time of the project

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Assessing the quality of Wikidata referencing

    Get PDF
    Wikidata is a versatile and broad-based Knowledge Graph (KG) that leverages the power of collaborative contributions via an open wiki, augmented by bot accounts, to curate the content. Wikidata represents over 102 million interlinked data entities, accompanied by over 1.4 billion statements about the items, accessible to the public via a SPARQL endpoint and diverse dump formats. The Wikidata data model enables assigning references to every single statement. While the quality of Wikidata statements has been assessed, the quality of references in this knowledge graph is not well covered in the literature. To cover the gap, we develop and implement a comprehensive referencing quality assessment framework based on Linked Data quality dimensions and criteria. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. RQSS provides quantified scores by which the referencing quality can be analyzed and compared. Due to the scale of Wikidata, we developed a subsetting approach to creating a comparison platform that systematically samples Wikidata. We have used both well-defined subsets and random samples to evaluate the quality of references in Wikidata using RQSS. Based on RQSS, the overall referencing quality in Wikidata subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores amongst topical subsets. Regarding referencing quality dimensions, all subsets have high scores in accuracy, availability, security, and understandability, but have weaker scores in completeness, verifiability, objectivity, and versatility. RQSS scripts can be reused to monitor the referencing quality over time. The evaluation shows that RQSS is practical and provides valuable information, which can be used by Wikidata contributors and WikiProject owners to identify the referencing quality gaps. Although RQSS is developed based on the Wikidata RDF model, its referencing quality assessment framework can be generalized to any RDF KG.James Watt Scholarship fundin

    Migration Research in a Digitized World: Using Innovative Technology to Tackle Methodological Challenges

    Get PDF
    This open access book explores implications of the digital revolution for migration scholars’ methodological toolkit. New information and communication technologies hold considerable potential to improve the quality of migration research by originating previously non-viable solutions to a myriad of methodological challenges in this field of study. Combining cutting-edge migration scholarship and methodological expertise, the book addresses a range of crucial issues related to both researcher-designed data collections and the secondary use of “big data”, highlighting opportunities as well as challenges and limitations. A valuable source for students and scholars engaged in migration research, the book will also be of keen interest to policymakers

    Data Mapping for XBRL: A Systematic Literature Review

    Get PDF
    It is evident the growth of the use of eXtensible Business Reporting Language (XBRL) technology in the context of financial reports on the Internet, either for its advantages and benefits or by government impositions, however, the data to be transported by this language are mostly stored in structures defined as database, some relational other NoSQL. The need to integrate XBRL technology with other data storage technologies has been growing continuously, and research is needed to seek a solution for mapping data between these environments. The possible difficulties in integrating XBRL with other technologies, relational database or NoSQL, CSV files, JSON, need to be mapped and overcome. Generating XBRL documents from the database can be costly, since there is no native alternative that the database manager system exports from the database manager system, the data in XBRL. For this, specific third-party systems are needed to generate XBRL documents. Generally, these systems are proprietary and have a high cost. Integrate these different technologies adds complexity, since these documents do not connect to the database manager system. These difficulties cause performance and storage problems and in cases of large data, such as data delivery to government agencies, complexity increases. Thus, it is essential to study techniques and methods that allow us to infer a solution to perform this integration and/or mapping, preferably in a generic way, that includes the XBRL data structure and the main data models currently used, i.e.  Relational DBMS, NoSQL, JSON or CSV files. It is expected, in this work, through a systematic literature review, to identify the state of the art concerning the mapping of XBRL data

    AIUCD 2022 - Proceedings

    Get PDF
    L’undicesima edizione del Convegno Nazionale dell’AIUCD-Associazione di Informatica Umanistica ha per titolo Culture digitali. Intersezioni: filosofia, arti, media. Nel titolo ù presente, in maniera esplicita, la richiesta di una riflessione, metodologica e teorica, sull’interrelazione tra tecnologie digitali, scienze dell’informazione, discipline filosofiche, mondo delle arti e cultural studies
    • 

    corecore