227 research outputs found
Semantic Data Management in Data Lakes
In recent years, data lakes emerged as away to manage large amounts of
heterogeneous data for modern data analytics. One way to prevent data lakes
from turning into inoperable data swamps is semantic data management. Some
approaches propose the linkage of metadata to knowledge graphs based on the
Linked Data principles to provide more meaning and semantics to the data in the
lake. Such a semantic layer may be utilized not only for data management but
also to tackle the problem of data integration from heterogeneous sources, in
order to make data access more expressive and interoperable. In this survey, we
review recent approaches with a specific focus on the application within data
lake systems and scalability to Big Data. We classify the approaches into (i)
basic semantic data management, (ii) semantic modeling approaches for enriching
metadata in data lakes, and (iii) methods for ontologybased data access. In
each category, we cover the main techniques and their background, and compare
latest research. Finally, we point out challenges for future work in this
research area, which needs a closer integration of Big Data and Semantic Web
technologies
A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data
The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe
Digitale Edition in Ăsterreich
Between 2016 and 2020 the federally funded project "KONDE - Kompetenznetzwerk Digitale Edition" created a network of collaboration between Austrian institutions and researchers working on digital scholarly editions. With the present volume the editors provide a space where researchers and editors from Austrian institutions could theorize on their work and present their editing projects. The collection creates a snapshot of the interests and main research areas regarding digital scholarly editing in Austria at the time of the project
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure â CLARIN â for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
Assessing the quality of Wikidata referencing
Wikidata is a versatile and broad-based Knowledge Graph (KG) that leverages the
power of collaborative contributions via an open wiki, augmented by bot accounts,
to curate the content. Wikidata represents over 102 million interlinked data entities,
accompanied by over 1.4 billion statements about the items, accessible to the public
via a SPARQL endpoint and diverse dump formats. The Wikidata data model enables assigning references to every single statement. While the quality of Wikidata
statements has been assessed, the quality of references in this knowledge graph is
not well covered in the literature. To cover the gap, we develop and implement
a comprehensive referencing quality assessment framework based on Linked Data
quality dimensions and criteria. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. RQSS provides
quantified scores by which the referencing quality can be analyzed and compared.
Due to the scale of Wikidata, we developed a subsetting approach to creating
a comparison platform that systematically samples Wikidata. We have used both
well-defined subsets and random samples to evaluate the quality of references in
Wikidata using RQSS. Based on RQSS, the overall referencing quality in Wikidata
subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher
overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores
amongst topical subsets. Regarding referencing quality dimensions, all subsets have
high scores in accuracy, availability, security, and understandability, but have weaker
scores in completeness, verifiability, objectivity, and versatility. RQSS scripts can
be reused to monitor the referencing quality over time. The evaluation shows that
RQSS is practical and provides valuable information, which can be used by Wikidata contributors and WikiProject owners to identify the referencing quality gaps.
Although RQSS is developed based on the Wikidata RDF model, its referencing
quality assessment framework can be generalized to any RDF KG.James Watt Scholarship fundin
Migration Research in a Digitized World: Using Innovative Technology to Tackle Methodological Challenges
This open access book explores implications of the digital revolution for migration scholarsâ methodological toolkit. New information and communication technologies hold considerable potential to improve the quality of migration research by originating previously non-viable solutions to a myriad of methodological challenges in this field of study. Combining cutting-edge migration scholarship and methodological expertise, the book addresses a range of crucial issues related to both researcher-designed data collections and the secondary use of âbig dataâ, highlighting opportunities as well as challenges and limitations. A valuable source for students and scholars engaged in migration research, the book will also be of keen interest to policymakers
Data Mapping for XBRL: A Systematic Literature Review
It is evident the growth of the use of eXtensible Business Reporting Language (XBRL) technology in the context of financial reports on the Internet, either for its advantages and benefits or by government impositions, however, the data to be transported by this language are mostly stored in structures defined as database, some relational other NoSQL. The need to integrate XBRL technology with other data storage technologies has been growing continuously, and research is needed to seek a solution for mapping data between these environments. The possible difficulties in integrating XBRL with other technologies, relational database or NoSQL, CSV files, JSON, need to be mapped and overcome. Generating XBRL documents from the database can be costly, since there is no native alternative that the database manager system exports from the database manager system, the data in XBRL. For this, specific third-party systems are needed to generate XBRL documents. Generally, these systems are proprietary and have a high cost. Integrate these different technologies adds complexity, since these documents do not connect to the database manager system. These difficulties cause performance and storage problems and in cases of large data, such as data delivery to government agencies, complexity increases. Thus, it is essential to study techniques and methods that allow us to infer a solution to perform this integration and/or mapping, preferably in a generic way, that includes the XBRL data structure and the main data models currently used, i.e. Relational DBMS, NoSQL, JSON or CSV files. It is expected, in this work, through a systematic literature review, to identify the state of the art concerning the mapping of XBRL data
AIUCD 2022 - Proceedings
Lâundicesima edizione del Convegno Nazionale dellâAIUCD-Associazione di Informatica Umanistica ha per titolo Culture digitali. Intersezioni: filosofia, arti, media. Nel titolo Ăš presente, in maniera esplicita, la richiesta di una riflessione, metodologica e teorica, sullâinterrelazione tra tecnologie digitali, scienze dellâinformazione, discipline filosofiche, mondo delle arti e cultural studies
- âŠ