1,316 research outputs found

    Marker-based filtering of bilingual phrase pairs for SMT

    Get PDF
    State-of-the-art statistical machine translation systems make use of a large translation table obtained after scoring a set of bilingual phrase pairs automatically extracted from a parallel corpus. The number of bilingual phrase pairs extracted from a pair of aligned sentences grows exponentially as the length of the sentences increases; therefore, the number of entries in the phrase table used to carry out the translation may become unmanageable, especially when online, 'on demand' translation is required in real time. We describe the use of closed-class words to filter the set of bilingual phrase pairs extracted from the parallel corpus by taking into account the alignment information and the type of the words involved in the alignments. On four European language pairs, we show that our simple yet novel approach can filter the phrase table by up to a third yet still provide competitive results compared to the baseline. Furthermore, it provides a nice balance between the unfiltered approach and pruning using stop words, where the deterioration in translation quality is unacceptably high

    Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation

    Get PDF
    In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods

    Comparing Classifier use in Chinese and Japanese

    Get PDF

    Task complexity and grammatical development in ESL

    Get PDF
    This study investigates whether second language learnersā€™ interlanguage (IL) systems change according to the tasks they perform. This is a long-debated issue in the fields of SLA and language learning pedagogy, as it may have implications for language assessment and syllabus design. Pienemannā€™s (1998) Steadiness Hypothesis states that the basic nature of an IL system does not change across different communication tasks, provided they involve the same skill type. Pienemann claims that it is the learnerā€™s L2 developmental stage rather than the nature of different tasks, which influences linguistic competence. Tarone (1985, 1988, 2014) and Bayley and Tarone (2012), on the other hand, claim that IL is systematically and predictably variable across tasks as a result of style shifting due to shifts in social and contextual variables such as the topic of the interaction. One important element missing in the debate is the definition of ā€˜different tasksā€™. In recent years, however, Robinsonā€™s Cognition Hypothesis (2003) has provided some clear criteria for classifying tasks according to their cognitive complexity. The present study, therefore, tests the two competing positions on IL variability by applying the Cognition Hypothesis for task evaluation. The main research question in the study is whether learnersā€™ IL systems vary with tasks of different degrees of cognitive complexity. In order to answer the question, tasks were designed by manipulating the task complexity variable of Ā± few elements, Ā± here-and-now and Ā± planning time. Tasks used in studies which apply Pienemannā€™s Processability Theory (PT, 1998) are used in this research. In this study, 30 adult Chinese L1-English L2 learners in Australia were recruited based on their IELTS scores: 10 were from IELTS band 7.0 or above; 10 were from IELTS bands 5.0ā€“ 5.5 and 10 were from IELTS band 4.5. First, the issue of competence in relation to tasks was approached by assessing the competence of learners by using traditional PT profiling tasks, such as ā€˜spot the differenceā€™ tasks. The second step was to use Robinsonā€™s (2007) cognitive complexity criteria to assess learners in each group while they performed a set of picture description tasks. Each learnerā€™s performance was measured in terms of its accuracy and syntactic complexity based on the learnerā€™s PT developmental stage to check whether the IL system across tasks was invariant. The results of this experimental study showed that each learner was quite stable across tasks in terms of morphological and syntactic complexity. The results of the accuracy analysis showed some, but not significant, differences between variables. This suggests that a learnerā€™s IL system remains steady across tasks and within tasks of different degrees of cognitive complexity. The results of this study thus support Pienemannā€™s Steadiness Hypothesis (1998)

    Bibliographie

    Get PDF

    Modeling information structure in a cross-linguistic perspective

    Get PDF
    This study makes substantial contributions to both the theoretical and computational treatment of information structure, with a specific focus on creating natural language processing applications such as multilingual machine translation systems. The present study first provides cross-linguistic findings in regards to information structure meanings and markings. Building upon such findings, the current model represents information structure within the HPSG/MRS framework using Individual Constraints. The primary goal of the present study is to create a multilingual grammar model of information structure for the LinGO Grammar Matrix system. The present study explores the construction of a grammar library for creating customized grammar incorporating information structure and illustrates how the information structure-based model improves performance of transfer-based machine translation

    Morphosyntactic Linguistic Wavelets for Knowledge Management

    Get PDF

    Natural language software registry (second edition)

    Get PDF

    Head finalization reordering for Chinese-to-Japanese machine translation.ā€ in

    Get PDF
    Abstract In Statistical Machine Translation, reordering rules have proved useful in extracting bilingual phrases and in decoding during translation between languages that are structurally different. Linguistically motivated rules have been incorporated into Chineseto-Englis
    • ā€¦
    corecore