20 research outputs found

    Holocaust and World War Two Linked Open Data Developments in the Netherlands

    Get PDF
    NIOD, Network War Collections (Netwerk Oorlogsbronnen) and EHRI all work on connecting and making war and Holocaust collections findable and (re-)usable. And both use new technology and Linked Open Data for these goals. This paper gives an overview of the latest developments of the work done in the Netherlands. It is organized around the axis of What, Where, Who & When

    Report on Standards

    Get PDF
    This document describes mechanisms where interoperability ofdata is ensured with the use of standards. The standards wecovered are both domain related, the archival standards in XMLformats such as EAD, EAC-CPF and EAG, and transversalstandards, whose use is recommended in the context of any digitalproject, in particular the ISO standards for the representation oflanguage, script and countries.Interoperability of archival descriptions expressed in EAD is madepossible with the specification of a specific EAD profile for EHRI.This profile is built and maintained using the TEI-ODD framework,which is explained of the first section of the report.Interoperability and reusability of EHRI resources is also ensuredwith the design of more consistent URLs, composed withstandardised methods and using ISO reference codes. This designhas to be seen as a first step through a persistent identifier system.The work initiated in WP11 and presented in this document will becontinued, enhanced and developed by other EHRI work packages,WP7 Virtual Access to EHRI Virtual Observatory, WP10 ResourceIdentification and Integration Workflows and WP13 Research DataInfrastructures for Holocaust Material

    Report on Standards

    Get PDF
    This document describes mechanisms where interoperability ofdata is ensured with the use of standards. The standards wecovered are both domain related, the archival standards in XMLformats such as EAD, EAC-CPF and EAG, and transversalstandards, whose use is recommended in the context of any digitalproject, in particular the ISO standards for the representation oflanguage, script and countries.Interoperability of archival descriptions expressed in EAD is madepossible with the specification of a specific EAD profile for EHRI.This profile is built and maintained using the TEI-ODD framework,which is explained of the first section of the report.Interoperability and reusability of EHRI resources is also ensuredwith the design of more consistent URLs, composed withstandardised methods and using ISO reference codes. This designhas to be seen as a first step through a persistent identifier system.The work initiated in WP11 and presented in this document will becontinued, enhanced and developed by other EHRI work packages,WP7 Virtual Access to EHRI Virtual Observatory, WP10 ResourceIdentification and Integration Workflows and WP13 Research DataInfrastructures for Holocaust Material

    Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

    Get PDF
    This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance

    EHRI in Boekarest

    No full text

    EHRI Vocabularies and Linked Open Data: An Enrichment?

    No full text
    International audienc

    First-Hand Accounts of War: War Letters (1935-1950) from NIOD Digitised

    No full text
    <b><p>Introduction</p></b> This dataset collection is created within the context of the digitisation project ‘First-Hand Accounts of War: War Letters (1935-1950) from NIOD Digitised’, that ran over a period of three years (2020-2023) and was funded by the Mondriaan Fund, the Dutch Ministry of Health, Welfare, and Sport, and the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam. The aim of the project was to preserve, digitize, and transcribe the NIOD’s war letters collection and to enhance access to these historical records in various ways.<p></p> <b><p>Creator</p></b> The NIOD Institute for War, Holocaust, and Genocide Studies was founded in 1945. The NIOD is a national and international archival institution and research institute. The NIOD-researchers conduct interdisciplinary academic research into the history of wars, mass violence and genocides. The institute holds over 400 archives and collections (2500 meters) about various topics related to World War II and mass violence in the 20th century.<p></p> <b><p>War letters</p></b> The project ‘First-Hand Accounts of War: War Letters (1935-1950) from NIOD Digitised’ digitised NIOD’s paper archival letter collection, also known as ‘247 Collectie Correspondentie’. The collection contains personal correspondence written and received in the context of the German Occupation of the Netherlands (1940-1945) and the War of Independence in Indonesia (in the late 1940s). Many people have been donating personal correspondence to NIOD since 1945 and new documents are acquired on a regular basis. The vast majority of the letters are written in Dutch and originate from the period 1935-1950. The archival collection ‘247 Collectie Correspondentie’ currently measures 14,1 meters and is divided into different inventory numbers. The collection entails a wide variety of different kinds of personal correspondence from various letter-writers.<p></p> <b><p>Contents</p></b> The dataset ‘First-Hand Accounts of War: War Letters (1935-1950) from NIOD Digitised’ contains machine-readable data in plain text and structured file formats. The data is aimed particularly at researchers interested in the (computational or quantitative) analysis of personal correspondence (‘egodocuments’) in bulk. The dataset consists of four different folders: <p></p> <b><p>1. Handwritten Text Recognition (HTR) model ‘NIOD_WarLet_1935-1950’</p></b> The Handwritten Text Recognition (HTR) model ‘NIOD_WarLet_1935-1950’ is trained using READ COOP’s Transkribus software (PyLaia HTR). The computer model is based on ‘Ground Truth’ transcriptions of high-resolution (600 dpi) scans of handwritten correspondence.<p></p> Contents: <ul> <li>README (.txt) with URL to web page with trained public Handwritten Text Recognition (HTR)-model and online demo version (building on the READ COOP’s Transkribus server).</l> </ul><p></p> <b><p>2. Ground Truth War Letters Transcriptions</p></b> The folder ‘Ground Truth War Letters Transcriptions’ includes 966 manually generated and checked transcriptions of high-resolution (600 dpi) scans of handwritten wartime correspondence.<p></p> Contents: <ul> <li>README (.txt)</li> <li>Ground Truth Transcriptions in ALTO-XML, 966 files (.xml)</li> </ul> <p></p> <b><p>3. War Letters (1935-1950) Transcriptions</p></b> The folder ‘War Letters (1935-1950) Transcriptions’ includes a large number of automatically generated transcriptions of historical handwritten correspondence. The dataset also includes the 966 manually transcribed and checked transcriptions in the Ground Truth War Letters Transcriptions Dataset.<p></p> Contents:<p></p> <ul> <li>README (.txt) <p></p></li> </ul> 3.1 War Letters (1935-1950) Transcriptions Dataset (per inventory number)<p></p> <ul><li>Transcriptions in plain text format, 1480 files, (.txt)<p></p></li></ul> <b><p>4. War Letters (1935-1950) Metadata</p></b> The folder ‘War Letters (1935-1950) Metadata’ contains a matrix with the metadata (inventory-number level). This metadataset contains additional information about the data, linked to identifiers (ISIL-codes) related to inventory numbers. <p></p>This metadata scheme includes the following features: <p> <ul> <li> isil: International Standard Identifier for Libraries and Related Organizations (ISIL) code as used by the NIOD Institute for War-, Holocaust-, and Genocide Studies and additional information about the collection (247) and the inventory number of the sub-collection. Example: ‘NL-AsdNIOD_247_0310’.</li> <li> inv_no: The inventory number of the particular sub-collection. Example: ‘310’. </li> <li> no_of_scans: The number of scans (retrieved from the Transkribus-folder containing all scans of each inventory number). Example: ‘1’. </li> <li> pid: persistent identifier of the particular sub-collection. Example: ‘https://proxy.archieven.nl/0/21FE75B1551287C9E0538A77ABC22FE9’. </li> <li> description: A short description of the contents of the sub-collection, derived from the archival index of the NIOD. Example: ‘Brief van Dolly aan haar vader en moeder’. </li> <li> date_range: Date of writing/sending a letter, or date range in case of a correspondence. Example: ‘15 augustus 1943’. </li> <li> keyword_inventory_1: A keyword referring to the contents of the inventory number, derived from the archival inventory index of the NIOD. Some keyword terms include synonyms (in Dutch). Keywords and index categories sometimes overlap, sometimes not. In the case both an index category and a keyword were available, the index category is defined as keyword_inventory_1, and the keyword as keyword_inventory_2 (see below). Example: ‘Duitse burgers en militairen’. </li> <li> keyword_inventory_2: For some sub-collections, a second keyword referring to the contents of the sub-collection could be derived from the archival inventory index of the NIOD. Example: ‘Militairen - Zie ook: Officieren, Soldaten’. </li> </ul> </p> <p></p> Contents: <ul> <li>README (.txt)</li> <li>Metadata dataset (.csv)</li> </ul> <p></p> <b><p>Access and Terms and Conditions</p></b> <b><p>1. Access</p></b> NIOD Institute for War, Holocaust, and Genocide Studies (hereinafter ‘NIOD’) strives to enhance access to and use of data that is collects and publishes for academic research purposes. NIOD makes a big effort to protect the privacy of living people and to track down copyright rights owners. <p></p> If you wish to use these datasets you must ask permission via the button on the right (‘Access File’) or via: [email protected] as Copyright, GDPR or other legal restrictions may apply. <p></p> <b><p>2. Terms and Conditions</p></b> <p>NIOD Institute for War, Holocaust, and Genocide Studies (hereinafter ‘NIOD’) strives to enhance access to and use of data that it collects and publishes for academic research purposes. NIOD provides any individual or (legal) entity (hereinafter ‘you’) with access to this Dataset free of charge. The use of this dataset is subject to the terms of this agreement (hereinafter “Terms and Conditions”). Use of any data derived from the dataset, which may appear in any other format than text, such as tables and charts, is also subject to these Terms and Conditions. </p> 1. Use of this dataset is permitted on condition that the dataset is used for non-commercial academic research purposes; <p></p> 2. This dataset may contain personal information relating to living persons or data that is protected by the Copyright Act or other applicable legislation and/or regulations. Data that is protected by Copyright and/or is relating to living persons or persons who still may be alive, are intended for personal study and are not published or otherwise made public or distributed, unless explicit consent has been obtained;<p></p> 3. NIOD is not liable for (the consequences of) the unlawful use of the data (referred to under 2.1. and 2.2) by you. <p></p> <p>NIOD aims to protect the privacy of living people and to track down copyrights owners. See our privacy policy for more details. If you have any privacy concerns, including requests to remove your name or any other personal information, please contact us via [email protected]. If you think you have rights to materials that are used in this dataset, you can also contact us via the email address listed above.</p&gt

    The eXtensible past : the relevance of the XML data format for access to historical datasets and a strategy for digital preservation

    No full text
    This article reports on the X-past project carried out by the Netherlands Historical Data Archive (NHDA). The main goal of the project has been to investigate how the XML data format can improve the durability of and access to historical datasets. The X-past project furthermore investigated whether it would be possible to provide access to historical datasets by means of the "Open Archives Initiative—Protocol for Metadata Harvesting" (OAI-PMH). Within the framework of the X-past project a prototype information system has been developed and a number of users have been asked to report on usability issues concerning this system

    NIOD_WarLet_1945-1950_NoBasemodel: Een openbaar HTR-model in Transkribus voor handgeschreven Nederlands uit het midden van de twintigste eeuw

    No full text
    The HTR model ‘NIOD_WarLet_1935-1950_NoBasemodel’ was trained using 968 ‘Ground Truth’ transcriptions of high-resolution scans of various handwritten letters. These letters are all written in Dutch and originate from the period 1935-1950. The training set contains personal correspondence from a wide variety of letter writers (e.g., children, soldiers, Jewish people in hiding). These personal correspondences are all part of the archival collection known as ‘247 Correspondentie’ held by the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam. This model was created as part of the project ‘First-Hand Accounts of War: War letters (1935-1950) from NIOD digitised’. All documents used for training and validation were scanned and transcribed within this project. This project ran from 2020 to 2023 and was funded by the Mondriaan Fund, the Dutch Ministry of Health, Welfare, and Sport, and the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam. The ‘Ground Truth’ training set is created by project members Annelies van Nispen, Carlijn Keijzer and Milan van Lange. Additional transcription and correction of ‘Ground Truth’ transcriptions was performed under supervision of Muriël Bouman by citizen scientists Hillebrand Verkroost, Bart Cohen, Evelien Bachrach, Marjo Janssens, and Cocky Sietses. The validation set contains a sample of 17 ‘Ground Truth’ transcriptions from various writers and sub-collections. The model is trained using PyLaia HTR, max. 500 epochs (321 epochs trained), learning rate 0.0003. CER (validation set) is 5,40%. No base model was used
    corecore