8 research outputs found

    The challenges of German archival document categorization on insufficient labeled data

    Get PDF
    Document exploration in archives is often challenging due to the lack of organization in topic-based categories. Moreover, archival records only provide short text which is often insufficient for capturing the semantic. This paper proposes and explores a dataless categorization approach that utilizes word embeddings and TF-IDF to categorize archival documents. Additionally, it introduces a visual approach built on top of the word embeddings to enhance the exploration of data. Preliminary results suggest that current vector representations alone do not provide enough external knowledge to solve this task

    Metadata schema and ontologies for FAIR research data in plasma technology

    No full text
    The findability (F), accessibility (A), interoperability (I), and re-usability (R) of research data are essential and acknowledged factors for an efficient re-use of data, e.g. for data driven science. However, in the field of plasma technology there is currently a lack of common standards and tools to publish data according to these FAIR data principles. To address this issue, the present contribution reports on the development of a modular metadata model for the representation of subject- and method-specific metadata in the field of plasma technology, which is based on the core plasma metadata schema, Plasma-MDS (https://arxiv.org/abs/1907.07744). The linking and semantic description of the metadata modules are carried out via ontologies. The developed tools and services are made available via a plasma technology knowledge graph and the data platform https://www.inptdat.de/. They are intended to be reviewed and further developed by the low-temperature plasma community to provide a common basis for open science and research data management according to the FAIR principles.Bundesministerium für Bildung und Forschun
    corecore