Search CORE

6,832 research outputs found

‘It’s not about the catalogue, it’s about the data’ Catalogue 2.0 : the future of the library catalogue

Author: Chambers Sally
Publication venue
Publication date: 01/01/2017
Field of study

Ghent University Academic Bibliography

The role of the library when computers can read:Critically adopting Handwritten Text Recognition (HTR) technologies to support research

Author: Terras Melissa
Publication venue
Publication date: 01/03/2022
Field of study

Edinburgh Research Explorer

Inviting AI into the archives:The reception of handwritten recognition technology into historical manuscript transcription

Author: Terras Melissa
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 21/04/2022
Field of study

Edinburgh Research Explorer

Transforming scholarship in the archives through handwritten text recognition:Transkribus as a case study

Author: Ares Oliveira Sofia
Bryan Maximilian
Colutto Sebastian
Diem Markus
Déjean Hervé
Fiel Stefan
Gatos Basilis
Greinoecker Albert
Grüning Tobias
Hackl Guenter
Haukkovaara Vili
Heyer Gerhard
Hirvonen Lauri
Hodel Tobias
Jokinen Matti
Jokinen Philip
Kallio Mario
Kaplan Frederic
Kleber Florian
Labahn Roger
Lang Eva Maria
Laube Sören
Leifert Gundram
Louloudis Georgios
McNicholl Rory
Meunier Jean-Luc
Michael Johannes
Muehlberger Guenter
Mühlbauer Elena
Philipp Nathanael
Pratikakis Ioannis
Puigcerver Pérez Joan
Putz Hannelore
Retsinas George
Romero Verónica
Sablatnig Robert
Schofield Philip
Seaward Louise
Sfikas Georgios
Sieber Christian
Stamatopoulos Nikolaos
Strauss Tobias
Sánchez Joan Andreu
Terbul Tamara
Terras Melissa
Toselli Alejandro Hector
Ulreich Berthold
Vicente Bosch
Vidal Enrique
Villega Mauricio
Walcher Johanna
Weidemann Max
Wurster Herbert
Zagoris Konstantinos
Publication venue: 'Emerald'
Publication date: 09/09/2019
Field of study

Purpose: An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues. - Design/methodology/approach: This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material. - Findings: Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified. - Research limitations/implications: The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc. - Practical implications: Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field. - Social implications: The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals. - Originality/value: This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector

Infoscience - École polytechnique fédérale de Lausanne

UCL Discovery

Edinburgh Research Explorer

ZORA

Bern Open Repository and Information System (BORIS)

SIMARA: a database for key-value information extraction from full pages

Author: Boillet Mélodie
Kermorvant Christopher
Moufflet Jean-François
Tarride Solène
Publication venue
Publication date: 26/04/2023
Field of study

We propose a new database for information extraction from historical handwritten documents. The corpus includes 5,393 finding aids from six different series, dating from the 18th-20th centuries. Finding aids are handwritten documents that contain metadata describing older archives. They are stored in the National Archives of France and are used by archivists to identify and find archival documents. Each document is annotated at page-level, and contains seven fields to retrieve. The localization of each field is not available in such a way that this dataset encourages research on segmentation-free systems for information extraction. We propose a model based on the Transformer architecture trained for end-to-end information extraction and provide three sets for training, validation and testing, to ensure fair comparison with future works. The database is freely accessible at https://zenodo.org/record/7868059

arXiv.org e-Print Archive

Transkribus and IIIF:Beneficial possibilities between image sharing and Handwritten Text Recognition frameworks

Author: Krull Florian
Muehlberger Guenter
Terras Melissa
Publication venue
Publication date: 25/06/2019
Field of study

Edinburgh Research Explorer

A plea for an upgrade to the digital craft of the historian and digital methodology for discovering the past

Author: Spina Salvatore
Publication venue
Publication date: 21/11/2022
Field of study

This essay aims to bid analogue historians assume that digitisation is the first step to creating historical heritage based on the new language of Science: Computer Science. As we know, Humanities disciplines cannot easily be encapsulated in a few understandable numbers and names. However, historians must boost Artificial Intelligence (such as Transkribus) and Neural Networks to let the Machine infer meaning from the digitised historical primary source and become the most powerful tool to help historians understand what happened in the Past. Historians (collaborating with data scientists, expert annotators, librarians, archivists, and others, who are crucial to the successful management of digital data collection) have to create the primary ontology, starting from coding manuscripts into digital text, as the Biscari Archive (Italy) study case

arXiv.org e-Print Archive

Just Because We Can Doesn’t Mean We Should: On Knowing and Protecting Data Produced by the Jewish Consumptives’ Relief Society

Author: Maness Jack M.
Pham Kim
Publication venue: Digital Commons @ DU
Publication date: 01/01/2022
Field of study

A recent project at the University of Denver Libraries used handwritten text recognition (HTR) software to create transcriptions of records from the Jewish Consumptives’ Relief Society (JCRS), a tuberculosis sanatorium located in Denver, Colorado from 1904 to 1954. Among a great many other potential uses, these type- and hand-written records give insight into the human experience of disease and epidemic, its treatment, its effect on cultures, and of Jewish immigration to and early life in the American West. Our intent is to provide these transcripts as data so the text may be computationally analyzed, pursuant to a larger effort in developing capacity in services and infrastructure to support digital humanities as a library, and to contribute to the emerging HTR ecosystem in archival work.Just because we can, however, doesn’t always mean we should: the realities of publishing large datasets online that contain medical and personal histories of potentially vulnerable people and communities introduce serious ethical considerations. This paper both underscores the value of HTR and frames ethical considerations related to protecting data derived from it. It suggests a terms-of-use intervention perhaps valuable to similar projects, one that balances meeting the research needs of digital scholars with the care and respect of persons, their communities and inheritors, who lives produced the very data now valuable to those researchers

University of Denver

OJS at Oregondigital.org (Oregon State University / University of Oregon)

Just Because We Can Doesn’t Mean We Should: On Knowing and Protecting Data Produced by the Jewish Consumptives’ Relief Society

Author: Maness Jack
Pham Kim
Publication venue: 'Oregon State University'
Publication date: 20/05/2022
Field of study

A recent project at the University of Denver Libraries used handwritten text recognition (HTR) software to create transcriptions of records from the Jewish Consumptives’ Relief Society (JCRS), a tuberculosis sanatorium located in Denver, Colorado from 1904 to 1954. Among a great many other potential uses, these type- and hand-written records give insight into the human experience of disease and epidemic, its treatment, its effect on cultures, and of Jewish immigration to and early life in the American West. Our intent is to provide these transcripts as data so the text may be computationally analyzed, pursuant to a larger effort in developing capacity in services and infrastructure to support digital humanities as a library, and to contribute to the emerging HTR ecosystem in archival work. Just because we can, however, doesn’t always mean we should: the realities of publishing large datasets online that contain medical and personal histories of potentially vulnerable people and communities introduce serious ethical considerations. This paper both underscores the value of HTR and frames ethical considerations related to protecting data derived from it. It suggests a terms-of-use intervention perhaps valuable to similar projects, one that balances meeting the research needs of digital scholars with the care and respect of persons, their communities and inheritors, who lives produced the very data now valuable to those researchers

OJS at Oregondigital.org (Oregon State University / University of Oregon)

Towards the Corpus of Latvian Romani Texts : Deciphering the Manuscripts in Jānis Leimanis' Archive

Author: Kozhanov Kirill
Perkova Natalia
Publication venue
Publication date: 01/01/2022
Field of study

Publisher Copyright: © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)Latvian Romani is a Northeastern Romani dialect with a limited number of publicly available sources. Two large archival collections of texts in Latvian Romani, compiled primarily in the 1930s in Latvia and Estonia, have been recently digitized as images and made available online for a wider public. In our study, we focus on one of these collections, the Latvian Romani folklore texts collected by Jānis Leimanis in interwar Latvia. In this paper, we describe how initial manual transcriptions, most of which have been created with the help of a special crowdsourcing platform, were integrated in the handwritten text recognition (HTR) workflow in Transkribus. We present two HTR models trained on the basis of Leimanis' collection and discuss various issues related to the work on these texts.Peer reviewe

Helsingin yliopiston digitaalinen arkisto