Search CORE

1,316 research outputs found

Marker-based filtering of bilingual phrase pairs for SMT

Author: Sánchez-Martínez Felipe
Way Andy
Publication venue: European Association for Machine Translation
Publication date: 01/01/2009
Field of study

State-of-the-art statistical machine translation systems make use of a large translation table obtained after scoring a set of bilingual phrase pairs automatically extracted from a parallel corpus. The number of bilingual phrase pairs extracted from a pair of aligned sentences grows exponentially as the length of the sentences increases; therefore, the number of entries in the phrase table used to carry out the translation may become unmanageable, especially when online, 'on demand' translation is required in real time. We describe the use of closed-class words to filter the set of bilingual phrase pairs extracted from the parallel corpus by taking into account the alignment information and the type of the words involved in the alignments. On four European language pairs, we show that our simple yet novel approach can filter the phrase table by up to a third yet still provide competitive results compared to the baseline. Furthermore, it provides a nice balance between the unfiltered approach and pruning using stop words, where the deterioration in translation quality is unacceptably high

Repositorio Institucional de la Universidad de Alicante

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

DCU Online Research Access Service

Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation

Author: Morrissey Sara
Way Andy
Publication venue
Publication date: 01/01/2006
Field of study

In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods

CiteSeerX

Irish Universities

DCU Online Research Access Service

Comparing Classifier use in Chinese and Japanese

Author: Bond Francis
Yue Hui Ting
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

Task complexity and grammatical development in ESL

Author: Ma Yuan
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2017
Field of study

This study investigates whether second language learners’ interlanguage (IL) systems change according to the tasks they perform. This is a long-debated issue in the fields of SLA and language learning pedagogy, as it may have implications for language assessment and syllabus design. Pienemann’s (1998) Steadiness Hypothesis states that the basic nature of an IL system does not change across different communication tasks, provided they involve the same skill type. Pienemann claims that it is the learner’s L2 developmental stage rather than the nature of different tasks, which influences linguistic competence. Tarone (1985, 1988, 2014) and Bayley and Tarone (2012), on the other hand, claim that IL is systematically and predictably variable across tasks as a result of style shifting due to shifts in social and contextual variables such as the topic of the interaction. One important element missing in the debate is the definition of ‘different tasks’. In recent years, however, Robinson’s Cognition Hypothesis (2003) has provided some clear criteria for classifying tasks according to their cognitive complexity. The present study, therefore, tests the two competing positions on IL variability by applying the Cognition Hypothesis for task evaluation. The main research question in the study is whether learners’ IL systems vary with tasks of different degrees of cognitive complexity. In order to answer the question, tasks were designed by manipulating the task complexity variable of ± few elements, ± here-and-now and ± planning time. Tasks used in studies which apply Pienemann’s Processability Theory (PT, 1998) are used in this research. In this study, 30 adult Chinese L1-English L2 learners in Australia were recruited based on their IELTS scores: 10 were from IELTS band 7.0 or above; 10 were from IELTS bands 5.0– 5.5 and 10 were from IELTS band 4.5. First, the issue of competence in relation to tasks was approached by assessing the competence of learners by using traditional PT profiling tasks, such as ‘spot the difference’ tasks. The second step was to use Robinson’s (2007) cognitive complexity criteria to assess learners in each group while they performed a set of picture description tasks. Each learner’s performance was measured in terms of its accuracy and syntactic complexity based on the learner’s PT developmental stage to check whether the IL system across tasks was invariant. The results of this experimental study showed that each learner was quite stable across tasks in terms of morphological and syntactic complexity. The results of the accuracy analysis showed some, but not significant, differences between variables. This suggests that a learner’s IL system remains steady across tasks and within tasks of different degrees of cognitive complexity. The results of this study thus support Pienemann’s Steadiness Hypothesis (1998)

Western Sydney ResearchDirect

Bibliographie

Author
Publication venue: 'Consortium Erudit'
Publication date: 01/01/1973
Field of study

Érudit

Modeling information structure in a cross-linguistic perspective

Author: Song Sanghoun
Publication venue: Language Science Press
Publication date: 01/01/2017
Field of study

This study makes substantial contributions to both the theoretical and computational treatment of information structure, with a specific focus on creating natural language processing applications such as multilingual machine translation systems. The present study first provides cross-linguistic findings in regards to information structure meanings and markings. Building upon such findings, the current model represents information structure within the HPSG/MRS framework using Individual Constraints. The primary goal of the present study is to create a multilingual grammar model of information structure for the LinGO Grammar Matrix system. The present study explores the construction of a grammar library for creating customized grammar incorporating information structure and illustrates how the information structure-based model improves performance of transfer-based machine translation

OAPEN Library

Institutional Repository of the Freie Universität Berlin

ZENODO

Language Science Press

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Directory of Open Access Books (DOAB)

Morphosyntactic Linguistic Wavelets for Knowledge Management

Author: Daniela López De Luise
Publication venue: 'IntechOpen'
Publication date: 02/03/2012
Field of study

IntechOpen

Crossref

Natural language software registry (second edition)

Author: Hinkelman Elizabeth
Jung Christoph
Vonerden Markus
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1993
Field of study

Universaar

Acronym

Head finalization reordering for Chinese-to-Japanese machine translation.” in

Author: Hajime Tsukada
Han Dan
Katsuhito Sudoh
Kevin Duh
Masaaki Nagata
Xianchao Wu
Publication venue
Publication date: 01/01/2012
Field of study

Abstract In Statistical Machine Translation, reordering rules have proved useful in extracting bilingual phrases and in decoding during translation between languages that are structurally different. Linguistically motivated rules have been incorporated into Chineseto-Englis

CiteSeerX