4,541 research outputs found

    An Enhancement Method for Japanese-English Automated Translation

    Get PDF
    We present a method for improving existing statistical machine translation methods using a knowledge base compiled from a bilingual corpus as well as sequence alignment and pattern matching techniques from the area of machine learning and bioinformatics. An alignment algorithm identifies similar sentences, which are then used to construct a better word order for the translation. Our preliminary test results indicate a significant improvement of the translation quality.

    Integrated Use of Internal and External Evidence in the Alignment of Multi-Word Named Entities

    Get PDF
    This paper proposes a method of extracting English multi-word named entities and their Japanese equivalents from a parallel corpus. The aim of our research is to extract multi-word named entities which are not listed in a dictionary of an English-to-Japanese MT system and appear infrequently in a parallel corpus. Our method makes its alignment on the basis of two kinds of external evidence provided by the context in which a bilingual pair appears, as well as two kinds of internal evidence within the pair. Each evidence is accompanied by a score, and the aggregate score is computed as a weighted sum of the scores. The appropriate weights are estimated with the logistic regression analysis. An experiment using a parallel corpus of Yomiuri Shimbun and The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingual pairs with the highest scores were judged to be correct

    Extraction of Broad-Scale, High-Precision Japanese-English Parallel Translation Expressions Using Lexical Information and Rules

    Get PDF

    Exploring the Effectiveness of Combined Web-based Corpus Tools for Beginner EFL DDL

    Get PDF
    The purpose of this study is to investigate the effectiveness of combining two newly developed web-based tools for the foreign language DDL classroom. One is a KWIC concordance tool, WebParaNews, and the other is a lexical profiling tool, the LagoWordProfiler. Both are freeware and are based on the same parallel corpus, ParaNews, which consists of newspaper texts in English along with their aligned translations in Japanese. Using the same syllabus to teach various types of noun phrases for ten weeks, only one tool was used with the 2013 group, and both of the two tools were used in combination with the 2014 group. In order to reconfirm the effectiveness of combining two tools, both of the two tools were also used in 2015 group. In each year the teaching effect was measured using a pre- and post-test, and students’ feedback was collected using a 31-item questionnaire. Groups using both tools performed better than the single tool group on the gain between the pre- and post-test and gave more positive student feedback. This combined-resource approach using different types of information from two corpus tools may be more helpful for understanding the targeted grammar items than a more traditional single tool approach

    Towards Bilingual Term Extraction in Comparable Patents

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese--Japanese Wikipedia

    Get PDF
    Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract either parallel sentences or fragments from them for SMT. In this article, we propose an integrated system to extract both parallel sentences and fragments from comparable corpora. We first apply parallel sentence extraction to identify parallel sentences from comparable sentences. We then extract parallel fragments from the comparable sentences. Parallel sentence extraction is based on a parallel sentence candidate filter and classifier for parallel sentence identification. We improve it by proposing a novel filtering strategy and three novel feature sets for classification. Previous studies have found it difficult to accurately extract parallel fragments from comparable sentences. We propose an accurate parallel fragment extraction method that uses an alignment model to locate the parallel fragment candidates and an accurate lexicon-based filter to identify the truly parallel fragments. A case study on the Chinese--Japanese Wikipedia indicates that our proposed methods outperform previously proposed methods, and the parallel data extracted by our system significantly improves SMT performance
    corecore