44,995 research outputs found

    A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

    Get PDF
    Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

    Raising students' awareness of cross-cultural contrastive rhetoric in English writing via an e-learning course

    Get PDF
    This study investigated the potential impact of e-learning on raising overseas students' cultural awareness and explored the possibility of creating an interactive learning environment for them to improve their English academic writing. The study was based on a comparison of Chinese and English rhetoric in academic writing, including a comparison of Chinese students' writings in Chinese with native English speakers' writings in English and Chinese students' writings in English with the help of an e-course and Chinese students' writings in English without the help of an e-course. Five features of contrastive rhetoric were used as criteria for the comparison. The experimental results show that the group using the e-course was successful in learning about defined aspects of English rhetoric in academic writing, reaching a level of performance that equalled that of native English speakers. Data analysis also revealed that e-learning resources helped students to compare rhetorical styles across cultures and that the interactive learning environment was effective in improving overseas students' English academic writing

    Structured Prediction of Sequences and Trees using Infinite Contexts

    Full text link
    Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical Pitman-Yor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finite-context Markov models on part-of-speech tagging and syntactic parsing

    Chengyu in Chinese Language Teaching: A preliminary analysis of Italian learners’ data

    Get PDF
    Chengyu, also known as Chinese four-character idioms, are a type of traditional Chinese idiom, mostly consisting of four characters. They commonly derive from classic Chinese literary sources, including those of the three great philosophical and religious traditions that influenced the entire East Asia cultural sphere: Confucianism, Daoism and Buddhism. Chengyu, therefore, possess a wide range of cultural references, and, from Chinese, spread to the languages of the other countries of the sinosphere, such as Japan and Korea. Although many scholars have emphasized the importance of the acquisition of chengyu, not much attention has been paid to chengyu learning in Chinese Language Teaching research so far. As a preliminary attempt to address this gap, this paper reports the results of two small-scale, exploratory experiments, aimed at investigating Italian learners’ general knowledge of chengyu and their main interpretation strategies, as well as comparing the effectiveness of direct and indirect instruction in chengyu teaching. The experiments involved participants from Bachelor and Master programs of Roma Tre University. The results show a predominant effect of negative transfer from Italian, as well as a better performance of the participants who received indirect instruction

    Ensuring Readability and Data-fidelity using Head-modifier Templates in Deep Type Description Generation

    Full text link
    A type description is a succinct noun compound which helps human and machines to quickly grasp the informative and distinctive information of an entity. Entities in most knowledge graphs (KGs) still lack such descriptions, thus calling for automatic methods to supplement such information. However, existing generative methods either overlook the grammatical structure or make factual mistakes in generated texts. To solve these problems, we propose a head-modifier template-based method to ensure the readability and data fidelity of generated type descriptions. We also propose a new dataset and two automatic metrics for this task. Experiments show that our method improves substantially compared with baselines and achieves state-of-the-art performance on both datasets.Comment: ACL 201
    • …
    corecore