1,316 research outputs found
Marker-based filtering of bilingual phrase pairs for SMT
State-of-the-art statistical machine translation
systems make use of a large translation table obtained after scoring a set of bilingual phrase pairs automatically extracted from a parallel corpus. The number of bilingual phrase pairs extracted from a pair of aligned sentences grows exponentially as the length of the sentences increases; therefore, the number of entries in the phrase table used to carry out the translation may become unmanageable, especially when online, 'on demand' translation is required in real time. We describe
the use of closed-class words to filter the set of bilingual phrase pairs extracted from the parallel corpus by taking into account the alignment information
and the type of the words involved in the alignments. On four European language pairs, we show that our simple yet novel approach can filter the phrase table by up to
a third yet still provide competitive results compared to the baseline. Furthermore, it provides a nice balance between the unfiltered approach and pruning using stop
words, where the deterioration in translation quality is unacceptably high
Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation
In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods
Task complexity and grammatical development in ESL
This study investigates whether second language learnersā interlanguage (IL) systems change according to the tasks they perform. This is a long-debated issue in the fields of SLA and language learning pedagogy, as it may have implications for language assessment and syllabus design. Pienemannās (1998) Steadiness Hypothesis states that the basic nature of an IL system does not change across different communication tasks, provided they involve the same skill type. Pienemann claims that it is the learnerās L2 developmental stage rather than the nature of different tasks, which influences linguistic competence. Tarone (1985, 1988, 2014) and Bayley and Tarone (2012), on the other hand, claim that IL is systematically and predictably variable across tasks as a result of style shifting due to shifts in social and contextual variables such as the topic of the interaction. One important element missing in the debate is the definition of ādifferent tasksā. In recent years, however, Robinsonās Cognition Hypothesis (2003) has provided some clear criteria for classifying tasks according to their cognitive complexity. The present study, therefore, tests the two competing positions on IL variability by applying the Cognition Hypothesis for task evaluation. The main research question in the study is whether learnersā IL systems vary with tasks of different degrees of cognitive complexity. In order to answer the question, tasks were designed by manipulating the task complexity variable of Ā± few elements, Ā± here-and-now and Ā± planning time. Tasks used in studies which apply Pienemannās Processability Theory (PT, 1998) are used in this research. In this study, 30 adult Chinese L1-English L2 learners in Australia were recruited based on their IELTS scores: 10 were from IELTS band 7.0 or above; 10 were from IELTS bands 5.0ā 5.5 and 10 were from IELTS band 4.5. First, the issue of competence in relation to tasks was approached by assessing the competence of learners by using traditional PT profiling tasks, such as āspot the differenceā tasks. The second step was to use Robinsonās (2007) cognitive complexity criteria to assess learners in each group while they performed a set of picture description tasks. Each learnerās performance was measured in terms of its accuracy and syntactic complexity based on the learnerās PT developmental stage to check whether the IL system across tasks was invariant. The results of this experimental study showed that each learner was quite stable across tasks in terms of morphological and syntactic complexity. The results of the accuracy analysis showed some, but not significant, differences between variables. This suggests that a learnerās IL system remains steady across tasks and within tasks of different degrees of cognitive complexity. The results of this study thus support Pienemannās Steadiness Hypothesis (1998)
Modeling information structure in a cross-linguistic perspective
This study makes substantial contributions to both the theoretical and computational treatment of information structure, with a specific focus on creating natural language processing applications such as multilingual machine translation systems. The present study first provides cross-linguistic findings in regards to information structure meanings and markings. Building upon such findings, the current model represents information structure within the HPSG/MRS framework using Individual Constraints. The primary goal of the present study is to create a multilingual grammar model of information structure for the LinGO Grammar Matrix system. The present study explores the construction of a grammar library for creating customized grammar incorporating information structure and illustrates how the information structure-based model improves performance of transfer-based machine translation
Head finalization reordering for Chinese-to-Japanese machine translation.ā in
Abstract In Statistical Machine Translation, reordering rules have proved useful in extracting bilingual phrases and in decoding during translation between languages that are structurally different. Linguistically motivated rules have been incorporated into Chineseto-Englis
- ā¦