Search CORE

8 research outputs found

TransBank: Metadata as the Missing Link Between NLP and Traditional Translation Studies

Author: Stauder Andy
Ustaszewski Michael
Publication venue
Publication date
Field of study

Despite the growing importance of data in translation, there is no data repository that equally meets the requirements of translation industry and academia alike.Therefore, we plan to develop a freely available, multilingual and expandable bank of translations and their source texts aligned at the sentence level. Special emphasis will be placed on the labelling of metadata that precisely describe the relations between translated texts and their originals. This metadata-centric approach gives users the opportunity to compile and download custom corpora on demand. Such a general-purpose data repository may help to bridge the gap between translation theory and the language industry, including translation technology providers and NLP.(VLID)2371561Version of recor

University of Innsbruck Digital Library

Exploring data provenance in handwritten text recognition infrastructure:Sharing and reusing ground truth data, referencing models, and acknowledging contributions. Starting the conversation on how we could get it done

Author: Afolabi Mary Aderonke
Anikina Anastasiia
Bastianello Elisa
Benzinger Lukas Vincent
Bhatia Aakriti
Bosse Arno
Brown David
Chagué Alix
Charlton Ashleigh
Depuydt Katrien
Go Sabine C. P. J.
Goh Marcus J.C.
Gordijn Femke
Gstrein Silvia
Hasan Sewa
Hindermann Maximilian
Hodel Tobias
Huff Dorothee
Huysman Ineke
Idris Ali
Keijser Liesbeth
Keijzer Carlijn
Kemper Simon
Koenders Sanne
Kuijpers Erika
Lepa Sven
Link Tommy O.
Nilsson Dannevig André
Nockels Joe
Oosterhuis Joost Johannes
Popken Vivien
Puertollano María Estrella
Purcell Jake
Puusaag Joosep J.
Rabus Achim
Romein C. Annemieke
Rønsig Larsen Lisette
Sheta Ahmed
Sitaram Chantal
Stauder Andy
Stoop Lex
Strandgaard Jensen Helle
Strutzenbladh Ebba
Terras Melissa
Trouw Barry Benaissa
van den Heuvel Pauline
van der Sijs Nicoline
van der Spek Jan Paul
van Gelder Klaas
van Lange Milan
van Nispen Annalies
van Noort Laura M.
Van Synghel Geertrui
van Zundert Joris
von der Heide Stefan
Vuckovic Vladimir
Weiss Sonia
Wilbrink Heleen
Wrisley David Joseph
Zweistra Riet
Publication venue
Publication date: 18/03/2024
Field of study

This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, and ways to reference and acknowledge contributions to the creation and enrichment of data within these Machine Learning systems. We discuss how one can publish Ground Truth data in a repository and, subsequently, inform others. Furthermore, we suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of Machine Learning in archival and library contexts, and how the community should begin toacknowledge and record both contributions and data provenance

Edinburgh Research Explorer

Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

Author: Afolabi-Adeolu Mary Aderonke
Anikina Anastasiia
Bastianello Elisa
Benzinger Lukas Vincent
Bhatia Aakriti
Bosse Arno
Brown David
Chagué Alix
Charlton Ash
Dannevig André Nilsson
Depuydt Katrien
Estrella Puertollano María
Gelder Klaas van
Go Sabine C.P.J.
Goh Marcus J.C.
Gordijn Femke
Gstrein Silvia
Hasan Sewa
Heide Stefan von der
Heuvel Pauline van den
Hindermann Maximilian
Hodel Tobias
Huff Dorothee
Huysman Ineke
Idris Ali
Jensen Helle Strandgaard
Keijzer Carlijn
Keijzer Liesbeth
Kemper Simon
Koenders Sanne
Kuijpers Erika
Lange Milan van
Lepa Sven
Link Tommy O.
Nispen Annelies van
Nockels Joe
Noort Laura M. van
Oosterhuis Joost Johannes
Popken Vivien
Purcell Jake
Puusaag Joosep J.
Rabus Achim
Romein C. Annemieke
Rønsig Larsen Lisette
Sheta Ahmed
Sijs Nicoline van der
Sitaram Chantal
Spek Jan Paul van der
Stauder Andy
Stoop Lex
Strutzenbladh Ebba
Terras Melissa M.
Trouw Barry Benaissa
Van Synghel Geertrui
Vučković Vladimir
Weiss Sonia
Wilbrink Heleen
Wrisley David Joseph
Zundert Joris J. van
Zweistra Riet
Publication venue: Episciences
Publication date: 01/01/2024
Field of study

This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance

Bern Open Repository and Information System (BORIS)

2012 survey of the preservation, management, and use of audiovisual media in European higher education institutions

Author: Andy Stauder
Publication venue: 'Emerald'
Publication date
Field of study

Crossref

Syntactic complexity as a stylistic feature of subtitles

Author: Stauder Andy
Ustaszewski Michael
Publication venue: Uniwersytet Wrocławski. Oficyna Wydawnicza ATUT – Wrocławskie Wydawnictwo Oświatowe
Publication date: 01/01/2020
Field of study

In audiovisual translation, stylometry can be used to measure formal-aesthetic fidelity. We present a corpus-based measure of syntactic complexity as a feature of language style. The methodology considers hierarchical dimensions of syntactic complexity, using syllable counting and dependency parsing. The test material are dialogues of several characters from the TV show “Two and a Half Men”. The results show that characters do not differ syntactically among themselves as much as might be expected, and that, despite a general tendency to level differences even more in translation, the changes in syntactic complexity between the original and translation depend mostly on the respective character-feature combination

Biblioteka Nauki - repozytorium artykuÅÃ³w

AV-Digitalisierung zwischen zwei Stühlen. Ein Werkstattbericht zur digitalen Archivierung im Hochschulbereich

Author: Mühlberger Günther
Stauder Andy
Publication venue: Neugebauer
Publication date: 01/01/2012
Field of study

AV Digitisation between two Stools – a Progress Report on Digital Preservation in Higher Education (translation of the title). The deterioration and decay of analogue AV media present a considerable problem that is not limited to commercial environments but also affects public organisations such as higher education institutions, libraries and archives. In light of this, and because there are no notable affordable solutions to this problem as far as the scenario in hand is concerned, a pertinent pilot project within the framework of the "PrestoPRIME" (see 2009) EU project has been initiated by the Univeristy of Innsbruck. The project deals with mass digitisation of AV media of so called consumer grade whose characteristics differ significantly from those of professional settings such as broadcasting. The main focus of the project is mass digitisation as certain issues only arise in connection with larger quantities of material

E-LIS

Are Digital Humanities Platforms Sufficiently Facilitating Diversity in Research? A Study of Transkribus Free Processing Requests

Author: Gooding Paul
Muehlberger Guenter
Nockels Joseph
Stauder Andy
Terras Melissa
Publication venue
Publication date: 27/07/2022
Field of study

No abstract available

Enlighten

Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

Author: Afolabi-Adeolu Mary Aderonke
Anikina Anastasiia
Bastianello Elisa
Benzinger Lukas Vincent
Bhatia Aakriti
Bosse Arno
Brown David
Chagué Alix
Charlton Ash
Dannevig André Nilsson
Depuydt Katrien
Estrella Puertollano María
Gelder Klaas Van
Go Sabine C.P.J.
Goh Marcus J.C.
Gordijn Femke
Gstrein Silvia
Hasan Sewa
Heide Stefan von Der
Heuvel Pauline van Den
Hindermann Maximilian
Hodel Tobias
Huff Dorothee
Huysman Ineke
Idris Ali
Jensen Helle Strandgaard
Keijzer Carlijn
Keijzer Liesbeth
Kemper Simon
Koenders Sanne
Kuijpers Erika
Lange Milan Van
Lepa Sven
Link Tommy
Nispen Annelies Van
Nockels Joe
Noort Laura
Oosterhuis Joost Johannes
Popken Vivien
Purcell Jake
Puusaag Joosep
Rabus Achim
Romein C. Annemieke
Rønsig Larsen Lisette
Sheta Ahmed
Sijs Nicoline van Der
Sitaram Chantal
Spek Jan Paul van Der
Stauder Andy
Stoop Lex
Strutzenbladh Ebba
Terras Melissa
Trouw Barry Benaissa
van Synghel Geertrui
Vučković Vladimir
Weiss Sonia
Wilbrink Heleen
Wrisley David Joseph
Zundert Joris
Zweistra Riet
Publication venue: HAL CCSD
Publication date: 30/11/2022
Field of study

INRIA a CCSD electronic archive server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY