8 research outputs found

    TransBank: Metadata as the Missing Link Between NLP and Traditional Translation Studies

    Get PDF
    Despite the growing importance of data in translation, there is no data repository that equally meets the requirements of translation industry and academia alike.Therefore, we plan to develop a freely available, multilingual and expandable bank of translations and their source texts aligned at the sentence level. Special emphasis will be placed on the labelling of metadata that precisely describe the relations between translated texts and their originals. This metadata-centric approach gives users the opportunity to compile and download custom corpora on demand. Such a general-purpose data repository may help to bridge the gap between translation theory and the language industry, including translation technology providers and NLP.(VLID)2371561Version of recor

    Exploring data provenance in handwritten text recognition infrastructure:Sharing and reusing ground truth data, referencing models, and acknowledging contributions. Starting the conversation on how we could get it done

    Get PDF
    This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, and ways to reference and acknowledge contributions to the creation and enrichment of data within these Machine Learning systems. We discuss how one can publish Ground Truth data in a repository and, subsequently, inform others. Furthermore, we suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of Machine Learning in archival and library contexts, and how the community should begin toacknowledge and record both contributions and data provenance

    Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

    Get PDF
    This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance

    Syntactic complexity as a stylistic feature of subtitles

    No full text
    In audiovisual translation, stylometry can be used to measure formal-aesthetic fidelity. We present a corpus-based measure of syntactic complexity as a feature of language style. The methodology considers hierarchical dimensions of syntactic complexity, using syllable counting and dependency parsing. The test material are dialogues of several characters from the TV show “Two and a Half Men”. The results show that characters do not differ syntactically among themselves as much as might be expected, and that, despite a general tendency to level differences even more in translation, the changes in syntactic complexity between the original and translation depend mostly on the respective character-feature combination

    AV-Digitalisierung zwischen zwei StĂŒhlen. Ein Werkstattbericht zur digitalen Archivierung im Hochschulbereich

    Get PDF
    AV Digitisation between two Stools – a Progress Report on Digital Preservation in Higher Education (translation of the title). The deterioration and decay of analogue AV media present a considerable problem that is not limited to commercial environments but also affects public organisations such as higher education institutions, libraries and archives. In light of this, and because there are no notable affordable solutions to this problem as far as the scenario in hand is concerned, a pertinent pilot project within the framework of the "PrestoPRIME" (see 2009) EU project has been initiated by the Univeristy of Innsbruck. The project deals with mass digitisation of AV media of so called consumer grade whose characteristics differ significantly from those of professional settings such as broadcasting. The main focus of the project is mass digitisation as certain issues only arise in connection with larger quantities of material

    Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

    No full text
    This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance
    corecore