Search CORE

302 research outputs found

Strong domain variation and treebank-induced LFG resources

Author: Burke Michael
Cahill Aoife
Judge John
O'Donovan Ruth
van Genabith Josef
Way Andy
Publication venue: CSLI Publications
Publication date: 01/01/2005
Field of study

In this paper we present a number of experiments to test the portability of existing treebank induced LFG resources. We test the LFG parsing resources of Cahill et al. (2004) on the ATIS corpus which represents a considerably different domain to the Penn-II Treebank Wall Street Journal sections, from which the resources were induced. This testing shows an under-performance at both c- and f-structure level as a result of the domain variation. We show that in order to adapt the LFG resources of Cahill et al. (2004) to this new domain, all that is necessary is to retrain the c-structure parser on data from the new domain

CiteSeerX

Irish Universities

DCU Online Research Access Service

Phrase extraction for machine translation

Author: 5th Computer Science Annual Workshop (CSAW’07)
Bajada Jo-Ann
Rosner Mike
Publication venue: University of Malta. Faculty of ICT
Publication date: 01/01/2007
Field of study

Statistical Machine Translation (SMT) developed in the late 1980s, based initially upon a word-to-word translation process. However, such processes have difficulties when good quality translation is not strictly word-to-word. Easy cases can be handled by allowing insertion and deletion of single words, but for more general word reordering phenomena, a more general translation process is required. There is currently much interest in phrase-to-phrase models, which can overcome this problem, but require that candidate phrases, together with their translations, be identified in the training corpora. Since phrase delimiters are not explicit, this gives rise to a new problem; that of phrase pair extraction. The current project proposes a phrase extraction algorithm which uses a window of n words around source and target words to extract equivalent phrases. The extracted phrases together with their probabilities, are used as input to an existing Machine Translation system for the purpose of evaluating the phrase extraction algorithm.peer-reviewe

OAR@UM

Exploring probabilistic grammars of symbolic music using PRISM

Author: Abdallah SA
Gold NE
Publication venue
Publication date: 17/07/2014
Field of study

In this paper we describe how we used the logic-based probabilistic programming language PRISM to conduct a systematic comparison of several probabilistic models of symbolic music, including 0th and 1st order Markov models over pitches and intervals, and a probabilistic grammar with two parameterisations. Using PRISM allows us to take advantage of variational Bayesian methods for assessing the goodness of fit of the models. When applied to a corpus of Bach chorales and the Essen folk song collection, we found that, depending on various parameters, the probabilistic grammars sometimes but not always out-perform the simple Markov models. Examining how the models perform on smaller subsets of pieces, we find that the simpler Markov models do out-perform the best grammar-based model at the small end of the scale

UCL Discovery

Handling Variation in Speech and Language Processing

Author: Fitt
Furui
Gold
Hermansky
Holmes
Huang
Jelinek
Jurafsky
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Crossref

Edinburgh Research Explorer

Improving Machine Translation of Educational Content via Crowdsourcing

Author: Behnke Maximiliana
Castilho Sheila
Egg Markus
Gaspari Federico
Georgakopoulou Panayota
Kermanidis Katia Lida
Kordoni Valia
Miceli Barone Antonio Valerio
Naskos Thanasis
Sennrich Rico
Sosoni Vilelmini
Stasimioti Maria
Takoulidou Eirini
van Zaanan Menno
Publication venue
Publication date: 01/01/2018
Field of study

The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with pre-existing in-domain corpora

Archivio della ricerca - Università degli studi di Napoli Federico II

Irish Universities

Edinburgh Research Explorer

DCU Online Research Access Service

Tilburg University Repository

libcloudph++ 0.2: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++

Author: Arabas Sylwester
Grabowski Wojciech W.
Jaruga Anna
Pawlowska Hanna
Publication venue: 'Copernicus GmbH'
Publication date: 18/08/2014
Field of study

This paper introduces a library of algorithms for representing cloud microphysics in numerical models. The library is written in C++, hence the name libcloudph++. In the current release, the library covers three warm-rain schemes: the single- and double-moment bulk schemes, and the particle-based scheme with Monte-Carlo coalescence. The three schemes are intended for modelling frameworks of different dimensionality and complexity ranging from parcel models to multi-dimensional cloud-resolving (e.g. large-eddy) simulations. A two-dimensional prescribed-flow framework is used in example simulations presented in the paper with the aim of highlighting the library features. The libcloudph++ and all its mandatory dependencies are free and open-source software. The Boost.units library is used for zero-overhead dimensional analysis of the code at compile time. The particle-based scheme is implemented using the Thrust library that allows to leverage the power of graphics processing units (GPU), retaining the possibility to compile the unchanged code for execution on single or multiple standard processors (CPUs). The paper includes complete description of the programming interface (API) of the library and a performance analysis including comparison of GPU and CPU setups.Comment: The library description has been updated to the new library API (i.e. v0.1 -> v0.2 update). The key difference is that the model state variables are now mixing ratios as opposed to densities. The particle-based scheme was supplemented with the "particle recycling" process. Numerous editorial corrections were mad

arXiv.org e-Print Archive

Directory of Open Access Journals