Search CORE

9,762 research outputs found

An automatic part-of-speech tagger for Middle Low German

Author: Breitbarth Anne
Desmet Bart
Farasyn Melissa
Hoste Veronique
Koleva Mariya
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2017
Field of study

Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them

Crossref

Ghent University Academic Bibliography

Usage-based and emergentist approaches to language acquisition

Author: Behrens Heike
Childers Jane B.
Freudenthal Daniel
Freudenthal Daniel
Freudenthal Daniel
Goldberg Adele E
Heike Behrens
Küntay Aylin C.
Lidz Je¤rey
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2009
Field of study

It was long considered to be impossible to learn grammar based on linguistic experience alone. In the past decade, however, advances in usage-based linguistic theory, computational linguistics, and developmental psychology changed the view on this matter. So-called usage-based and emergentist approaches to language acquisition state that language can be learned from language use itself, by means of social skills like joint attention, and by means of powerful generalization mechanisms. This paper first summarizes the assumptions regarding the nature of linguistic representations and processing. Usage-based theories are nonmodular and nonreductionist, i.e., they emphasize the form-function relationships, and deal with all of language, not just selected levels of representations. Furthermore, storage and processing is considered to be analytic as well as holistic, such that there is a continuum between children's unanalyzed chunks and abstract units found in adult language. In the second part, the empirical evidence is reviewed. Children's linguistic competence is shown to be limited initially, and it is demonstrated how children can generalize knowledge based on direct and indirect positive evidence. It is argued that with these general learning mechanisms, the usage-based paradigm can be extended to multilingual language situations and to language acquisition under special circumstances

Crossref

edoc

DARIAH and the Benelux

Author: Backes Marianne
Chambers Sally
Hoogerwerf Maarten
Van der West Jan
Publication venue: Department of Applied Linguistics, Translators and Interpreters, University of Antwerp
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Natural language processing and cognitive science : proceedings 2018

Author: Lubaszewski Wiesław
Sedes Florence
Sharp Bernadette
Publication venue: Jagiellonian Library
Publication date: 01/01/2018
Field of study

Jagiellonian Univeristy Repository

Review of Laurence R. Horn and Yasuhiko Kato (eds) (2000) Negation and polarity: syntactic and semantic perspectives. (Oxford University Press.)

Author: Rowlett PA
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2002
Field of study

University of Salford Institutional Repository

Crossref

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

Introduction: looking for meaning: methodological issues in translation studies

Author: Vandepitte Sonia
Publication venue: Artesis
Publication date: 01/01/2008
Field of study

Ghent University Academic Bibliography

Dating Texts without Explicit Temporal Cues

Author: Baldridge Jason
Ghosh Joydeep
Kumar Abhimanu
Lease Matthew
Publication venue
Publication date: 10/11/2012
Field of study

This paper tackles temporal resolution of documents, such as determining when a document is about or when it was written, based only on its text. We apply techniques from information retrieval that predict dates via language models over a discretized timeline. Unlike most previous works, we rely {\it solely} on temporal cues implicit in the text. We consider both document-likelihood and divergence based techniques and several smoothing methods for both of them. Our best model predicts the mid-point of individuals' lives with a median of 22 and mean error of 36 years for Wikipedia biographies from 3800 B.C. to the present day. We also show that this approach works well when training on such biographies and predicting dates both for non-biographical Wikipedia pages about specific years (500 B.C. to 2010 A.D.) and for publication dates of short stories (1798 to 2008). Together, our work shows that, even in absence of temporal extraction resources, it is possible to achieve remarkable temporal locality across a diverse set of texts

arXiv.org e-Print Archive

CiteSeerX

Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules

Author: Berck Peter
Daelemans Walter
Gillis Steven
Publication venue
Publication date: 01/01/1996
Field of study

We describe a case study in the application of {\em symbolic machine learning} techniques for the discovery of linguistic rules and categories. A supervised rule induction algorithm is used to learn to predict the correct diminutive suffix given the phonological representation of Dutch nouns. The system produces rules which are comparable to rules proposed by linguists. Furthermore, in the process of learning this morphological task, the phonemes used are grouped into phonologically relevant categories. We discuss the relevance of our method for linguistics and language technology

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Presenting GECO : an eyetracking corpus of monolingual and bilingual sentence reading

Author: Cop Uschi
Dirix Nicolas
Drieghe Denis
Duyck Wouter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This paper introduces GECO, the Ghent Eye-tracking Corpus, a monolingual and bilingual corpus of eye-tracking data of participants reading a complete novel. English monolinguals and Dutch-English bilinguals read an entire novel, which was presented in paragraphs on the screen. The bilinguals read half of the novel in their first language, and the other half in their second language. In this paper we describe the distributions and descriptive statistics of the most important reading time measures for the two groups of participants. This large eye-tracking corpus is perfectly suited for both exploratory purposes as well as more directed hypothesis testing, and it can guide the formulation of ideas and theories about naturalistic reading processes in a meaningful context. Most importantly, this corpus has the potential to evaluate the generalizability of monolingual and bilingual language theories and models to reading of long texts and narratives

Southampton (e-Prints Soton)

Ghent University Academic Bibliography

Archivsystem Ask23