Search CORE

1,826 research outputs found

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

SPOD:Syntactic Profiler of Dutch

Author: Bouma Gosse
Hoeksema Jack
Kleiweg Peter
van Noord Gertjan
Publication venue
Publication date: 01/12/2020
Field of study

Proceedings - University of Groningen

SPOD:Syntactic Profiler of Dutch

Author: Bouma Gosse
Hoeksema Jack
Kleiweg Peter
van Noord Gertjan
Publication venue
Publication date: 01/12/2020
Field of study

SPOD is a tool for Dutch syntax in which a given corpus is analysed according to a large number of predefined syntactic characteristics. SPOD is an extension of the PaQu (”Parse and Query”) tool (Odijk et al. 2017). SPOD is available for a number of standard Dutch corpora and treebanks.In addition, you can upload your own texts which will then be syntactically analysed. SPOD will run a potentially large number of syntactic queries in order to show a variety of corpus properties, such as the number of main and subordinate clauses, types of main and subordinate clauses, and their frequencies, average length of clauses (per clause type: e.g. relative clauses, indirect questions, finite complement clauses, infinitival clauses, finite adverbial clauses, etc.). Other syntactic constructions include comparatives, correlatives, various types of verb clusters, separable verb prefixes, depth of embedding etc.SPOD allows linguists to obtain a quick overview of the syntactic properties of texts, for instance with the goal to find interesting differences between text types, or between authors with different backgrounds or different age. In the paper, we describe the SPOD tool in some more detail, and we provide a case study, illustrating the type of investigations which are enabled andfacilitated by SPOD. Most of the syntactic properties are implemented in SPOD by means of relatively complicated XPath 2.0 queries, and as such SPOD also provides examples of relevant syntactic queries, which may otherwise be relatively hard to define for non-technical linguists

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

SPOD:Syntactic Profiler of Dutch

Author: Bouma Gosse
Hoeksema Jack
Kleiweg Peter
van Noord Gertjan
Publication venue
Publication date: 01/12/2020
Field of study

University of Groningen

SPOD:Syntactic Profiler of Dutch

Author: Bouma Gosse
Hoeksema Jack
Kleiweg Peter
van Noord Gertjan
Publication venue
Publication date: 01/12/2020
Field of study

Dissertations of the University of Groningen

Enriching a Scientific Grammar with Links to Linguistic Resources: The Taalportaal

Author: Bouma Gosse
Landsbergen Frank
Odijk Jan
Odijk Jan
van de Camp Matje
van der Wouden Ton
van Hessen Arjan
van Koppen Marjo
Publication venue: UBIQUITY PRESS LTD
Publication date: 28/12/2017
Field of study

Scientic research within the humanities is dierent from what it was a few decades ago. For instance, new sources of information, such as digital grammars, lexical databases and large corpora of real-language data oer new opportunities for linguistics. The Taalportaal grammatical database, with its links to other linguistic resources via the CLARIN infrastructure, is a prime example of a new type of tool for linguistic research.

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Utrecht University Repository

Dissertations of the University of Groningen

Corpus linguistics as digital scholarship : Big data, rich data and uncharted data

Author: Nevalainen Taimi Terttu Annikki
Suhr Carla Maria
Taavitsainen Irma Aini Johanna
Publication venue: Brill
Publication date: 01/01/2019
Field of study

This introductory chapter begins by considering how the fields of corpus linguistics, digital linguistics and digital humanities overlap, intertwine and feed off each other when it comes to making use of the increasing variety of resources available for linguistic research today. We then move on to discuss the benefits and challenges of three partly overlapping approaches to the use of digital data sources: (1) increasing data size to create “big data”, (2) supplying multi-faceted co(n)textual information and analyses to produce “rich data”, and (3) adapting existing data sets to new uses by drawing on hitherto “uncharted data”. All of them also call for new digital tools and methodologies that, in Tim Hitchcock’s words, “allow us to think small; at the same time as we are generating tools to imagine big.” We conclude the chapter by briefly describing how the contributions in this volume make use of their various data sources to answer new research questions about language use and to revisit old questions in new ways.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Querying Syntactic Constructions in Ancient Greek Parsed Corpora : A Case Study on the Genitive Absolute in Literature and Documentary Papyri

Author: Valentinova Yordanova Polina
Vierros Marja
Publication venue
Publication date: 30/04/2022
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Extending automatic transcripts in a unified data representation towards a prosodic-based metadata annotation and evaluation

Author: Batista F.
Mamede N.
Mata A. I.
Moniz H.
Trancoso I.
Publication venue: Luso-Brazilian Association of Speech Sciences
Publication date: 01/01/2012
Field of study

This paper describes a framework that extends automatic speech transcripts in order to accommodate relevant information coming from manual transcripts, the speech signal itself, and other resources, like lexica. The proposed framework automatically collects, relates, computes, and stores all relevant information together in a self-contained data source, making it possible to easily provide a wide range of interconnected information suitable for speech analysis, training, and evaluating a number of automatic speech processing tasks. The main goal of this framework is to integrate different linguistic and paralinguistic layers of knowledge for a more complete view of their representation and interactions in several domains and languages. The processing chain is composed of two main stages, where the first consists of integrating the relevant manual annotations in the speech recognition data, and the second consists of further enriching the previous output in order to accommodate prosodic information. The described framework has been used for the identification and analysis of structural metadata in automatic speech transcripts. Initially put to use for automatic detection of punctuation marks and for capitalization recovery from speech data, it has also been recently used for studying the characterization of disfluencies in speech. It was already applied to several domains of Portuguese corpora, and also to English and Spanish Broadcast News corpora

Repositório Institucional do ISCTE-IUL