Search CORE

3,682 research outputs found

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

The Gutenberg English Poetry Corpus: Exemplary Quantitative Narrative Analyses

Author: Andrzejewski
Aryani
Aryani
Baroni
Bird
Bohrn
Bornet
Braun
Brysbaert
Burrows
Clements
Deerwester
Frank
Ganascia
Geurts
Hanauer
Jacobs
Jacobs
Jacobs
Jacobs
Jacobs
Jacobs
Jacobs
Jacobs
Jacobs
Jacobs
Jacobs
Jacobs
Jakobson
Jurafsky
Katz
Leech
Michel
Mitchell
Moretti
Nicklas
O’Sullivan
Pedregosa
Roe
Schmidtke
Schmidtke
Schrott
Simonton
Simonton
Stamatatos
Stenneken
Steyvers
Stockwell
Tsur
Turner
Turney
Ullrich
van den Hoven
van Halteren
Vendler
Westbury
Willems
Ziegler
Ziegler
Zipf
Publication venue
Publication date: 01/01/2018
Field of study

This paper describes a corpus of about 3,000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative narrative analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC), which comprises over 100 poetic texts with around two million words from about 50 authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot’s poem “How Lisa Loved the King” and James Joyce’s “Chamber Music,” concerning, e.g., lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Computational Stylistics, or Neurocognitive Poetics, e.g., as training and test corpus for stimulus development and control in empirical studies

Institutional Repository of the Freie Universität Berlin

Crossref

Term-community-based topic detection with variable resolution

Author: Hamm Andreas
Odrowski Simon
Publication venue: 'MDPI AG'
Publication date: 25/03/2021
Field of study

Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present in detail a method that is especially designed with the requirements of domain experts in mind. Like similar methods, it employs community detection in term co-occurrence graphs, but it is enhanced by including a resolution parameter that can be used for changing the targeted topic granularity. We also establish a term ranking and use semantic word-embedding for presenting term communities in a way that facilitates their interpretation. We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations of detected topics at various resolutions. A comparison with topics detected by Latent Dirichlet Allocation is also included. Finally, we discuss factors that influence topic interpretation.Comment: 31 pages, 6 figure

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Multidisciplinary Digital Publishing Institute

Linguistic, lexicographic, and computational perspectives

Author: Barbu Mititelu Verginica
Giouli Voula
Publication venue
Publication date: 01/01/2024
Field of study

Institutional Repository of the Freie Universität Berlin