3 research outputs found

    Integrated chinese word segmentation in statistical machine translation

    No full text
    2. introduction 3. baseline statistical machine translation system 4. word segmentation model and segmentation approaches 5. computational steps 6. corpus statistics 7. translation results and computational requirements 8. conclusion Xu, Matusov, Zens, Ney: Integrated Chinese Word Segmentation in SMT 2 IWSLT05 Pittsburgh Oct 25, 2005Problem Description Words are not separated by white space in Chinese sentence Standard approach in statistical machine translation: • segmentation of Chinese character sequences into words • training and translation are performed afterwards The problems of the standard approach: 1. segmentation may contain errors 2. for a given character sequence, the best segmentation depends on its context 3. manual segmentation is not necessarily the best segmentation for translation (e. g. into English
    corecore