4,541 research outputs found
An Enhancement Method for Japanese-English Automated Translation
We present a method for improving existing statistical machine translation methods using a knowledge base compiled from a bilingual corpus as well as sequence alignment and pattern matching techniques from the area of machine learning and bioinformatics. An alignment algorithm identifies similar sentences, which are then used to construct a better word order for the translation. Our preliminary test results indicate a significant improvement of the translation quality.
Integrated Use of Internal and External Evidence in the Alignment of Multi-Word Named Entities
This paper proposes a method of extracting English multi-word named entities and their Japanese equivalents from a parallel corpus. The aim of our research is to extract multi-word named entities which are not listed in a dictionary of an English-to-Japanese MT system and appear infrequently in a parallel corpus. Our method makes its alignment on the basis of two kinds of external evidence provided by the context in which a bilingual pair appears, as well as two kinds of internal evidence within the pair. Each evidence is accompanied by a score, and the aggregate score is computed as a weighted sum of the scores. The appropriate weights are estimated with the logistic regression analysis. An experiment using a parallel corpus of Yomiuri Shimbun and The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingual pairs with the highest scores were judged to be correct
Exploring the Effectiveness of Combined Web-based Corpus Tools for Beginner EFL DDL
The purpose of this study is to investigate the effectiveness of combining two newly developed web-based tools for the foreign language DDL classroom. One is a KWIC concordance tool, WebParaNews, and the other is a lexical profiling tool, the LagoWordProfiler. Both are freeware and are based on the same parallel corpus, ParaNews, which consists of newspaper texts in English along with their aligned translations in Japanese. Using the same syllabus to teach various types of noun phrases for ten weeks, only one tool was used with the 2013 group, and both of the two tools were used in combination with the 2014 group. In order to reconfirm the effectiveness of combining two tools, both of the two tools were also used in 2015 group. In each year the teaching effect was measured using a pre- and post-test, and students’ feedback was collected using a 31-item questionnaire. Groups using both tools performed better than the single tool group on the gain between the pre- and post-test and gave more positive student feedback. This combined-resource approach using different types of information from two corpus tools may be more helpful for understanding the targeted grammar items than a more traditional single tool approach
Towards Bilingual Term Extraction in Comparable Patents
PACLIC 23 / City University of Hong Kong / 3-5 December 200
Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese--Japanese Wikipedia
Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract either parallel sentences or fragments from them for SMT. In this article, we propose an integrated system to extract both parallel sentences and fragments from comparable corpora. We first apply parallel sentence extraction to identify parallel sentences from comparable sentences. We then extract parallel fragments from the comparable sentences. Parallel sentence extraction is based on a parallel sentence candidate filter and classifier for parallel sentence identification. We improve it by proposing a novel filtering strategy and three novel feature sets for classification. Previous studies have found it difficult to accurately extract parallel fragments from comparable sentences. We propose an accurate parallel fragment extraction method that uses an alignment model to locate the parallel fragment candidates and an accurate lexicon-based filter to identify the truly parallel fragments. A case study on the Chinese--Japanese Wikipedia indicates that our proposed methods outperform previously proposed methods, and the parallel data extracted by our system significantly improves SMT performance
Recommended from our members
Pattern Matching for Translating Domain-Specific Terms from Large Corpora
Translating domain-specific terms is one significant component of machine translation and Machine-aided translation systems. These terms are often not found in standard dictionaries. Human translators, not being experts in every technical or regional domain, cannot produce their translations effectively. Automatic translation of domain-specific terms is therefore highly desirable. Most other work on automatic term translation uses statistical information of words from parallel corpora. Parallel corpora of clean- translated texts are hard to come by whereas there are more noisy- translated texts and many more monolingual texts in various domains. We propose using noisy parallel texts and same-domain texts of a pair of languages to translate terms. In our work, we propose using a novel paradigm of pattern matching of statistical signals of word features. These features are robust to the syntactic structure, character sets, language of the text, and to the domain. We obtain statistical information which is related to the lexical properties of a word and its translation in any other language of the same domain. These lexical properties are extracted from the corpora and represented in vector form. We propose using signal processing techniques for matching these features vectors of a word to those of its translation. Another matching technique we propose is applying discriminative analysis of the word features. For each word, the various features are combined into a single vector which is then transformed into a smaller dimension eigenvector for matching. Since most domain specific terms are nouns and noun phrases, we concentrate on translating English nouns and noun phrases into other languages. We study the relationship between English noun phrases and their translations in Chinese, Japanese and French in parallel corpora. The result of this study is used in our system for translation of English noun phrases into these other languages from noisy parallel and non-parallel corpora
- …