Search CORE

4,541 research outputs found

An Enhancement Method for Japanese-English Automated Translation

Author: Winiwarter Werner
Wloka Bartholomäus
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 15/09/2010
Field of study

We present a method for improving existing statistical machine translation methods using a knowledge base compiled from a bilingual corpus as well as sequence alignment and pattern matching techniques from the area of machine learning and bioinformatics. An alignment algorithm identifies similar sentences, which are then used to construct a better word order for the translation. Our preliminary test results indicate a significant improvement of the translation quality.

Biblioteka Nauki - repozytorium artykuÅÃ³w

Investigationes Linguisticae

Integrated Use of Internal and External Evidence in the Alignment of Multi-Word Named Entities

Author: 九津見毅
井佐原均
佐田いち子
吉見毅彦
小谷克則
Publication venue: Logico-Linguistic Society of Japan
Publication date: 16/11/2005
Field of study

This paper proposes a method of extracting English multi-word named entities and their Japanese equivalents from a parallel corpus. The aim of our research is to extract multi-word named entities which are not listed in a dictionary of an English-to-Japanese MT system and appear infrequently in a parallel corpus. Our method makes its alignment on the basis of two kinds of external evidence provided by the context in which a bilingual pair appears, as well as two kinds of internal evidence within the pair. Each evidence is accompanied by a score, and the aggregate score is computed as a weighted sum of the scores. The appropriate weights are estimated with the logistic regression analysis. An experiment using a parallel corpus of Yomiuri Shimbun and The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingual pairs with the highest scores were judged to be correct

Waseda University Repository

Extraction of Broad-Scale, High-Precision Japanese-English Parallel Translation Expressions Using Lexical Information and Rules

Author: Ma Qing
Murata Masaki
Sakagami Shinya
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

Exploring the Effectiveness of Combined Web-based Corpus Tools for Beginner EFL DDL

Author: Chujo Kiyomi
Kobayashi Yuichiro
Mizumoto Atsushi
Oghigian Kathryn
中條清美
小林雄一郎
水本篤
Publication venue: 'Horizon Research Publishing Co., Ltd.'
Publication date: 01/01/2016
Field of study

The purpose of this study is to investigate the effectiveness of combining two newly developed web-based tools for the foreign language DDL classroom. One is a KWIC concordance tool, WebParaNews, and the other is a lexical profiling tool, the LagoWordProfiler. Both are freeware and are based on the same parallel corpus, ParaNews, which consists of newspaper texts in English along with their aligned translations in Japanese. Using the same syllabus to teach various types of noun phrases for ten weeks, only one tool was used with the 2013 group, and both of the two tools were used in combination with the 2014 group. In order to reconfirm the effectiveness of combining two tools, both of the two tools were also used in 2015 group. In each year the teaching effect was measured using a pre- and post-test, and students’ feedback was collected using a 31-item questionnaire. Groups using both tools performed better than the single tool group on the gain between the pre- and post-test and gave more positive student feedback. This combined-resource approach using different types of information from two corpus tools may be more helpful for understanding the targeted grammar items than a more traditional single tool approach

Kansai University Repository

Towards Bilingual Term Extraction in Comparable Patents

Author: Lu Bin
Tsou Benjamin K.
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

AN INVESTIGATION INTO THE CROSS-LINGUISTIC ROBUSTNESS OF TEXTUAL EQUIVALENCE TECHNIQUES

Author: Alshahrani Amal
Publication venue
Publication date: 31/12/2018
Field of study

The University of Manchester - Institutional Repository

Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese--Japanese Wikipedia

Author: Chu Chenhui
Kurohashi Sadao
Nakazawa Toshiaki
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2016
Field of study

Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract either parallel sentences or fragments from them for SMT. In this article, we propose an integrated system to extract both parallel sentences and fragments from comparable corpora. We first apply parallel sentence extraction to identify parallel sentences from comparable sentences. We then extract parallel fragments from the comparable sentences. Parallel sentence extraction is based on a parallel sentence candidate filter and classifier for parallel sentence identification. We improve it by proposing a novel filtering strategy and three novel feature sets for classification. Previous studies have found it difficult to accurately extract parallel fragments from comparable sentences. We propose an accurate parallel fragment extraction method that uses an alignment model to locate the parallel fragment candidates and an accurate lexicon-based filter to identify the truly parallel fragments. A case study on the Chinese--Japanese Wikipedia indicates that our proposed methods outperform previously proposed methods, and the parallel data extracted by our system significantly improves SMT performance

Kyoto University Research Information Repository

Recommended from our members

Pattern Matching for Translating Domain-Specific Terms from Large Corpora

Author: Fung Pascale
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1995
Field of study

Translating domain-specific terms is one significant component of machine translation and Machine-aided translation systems. These terms are often not found in standard dictionaries. Human translators, not being experts in every technical or regional domain, cannot produce their translations effectively. Automatic translation of domain-specific terms is therefore highly desirable. Most other work on automatic term translation uses statistical information of words from parallel corpora. Parallel corpora of clean- translated texts are hard to come by whereas there are more noisy- translated texts and many more monolingual texts in various domains. We propose using noisy parallel texts and same-domain texts of a pair of languages to translate terms. In our work, we propose using a novel paradigm of pattern matching of statistical signals of word features. These features are robust to the syntactic structure, character sets, language of the text, and to the domain. We obtain statistical information which is related to the lexical properties of a word and its translation in any other language of the same domain. These lexical properties are extracted from the corpora and represented in vector form. We propose using signal processing techniques for matching these features vectors of a word to those of its translation. Another matching technique we propose is applying discriminative analysis of the word features. For each word, the various features are combined into a single vector which is then transformed into a smaller dimension eigenvector for matching. Since most domain specific terms are nouns and noun phrases, we concentrate on translating English nouns and noun phrases into other languages. We study the relationship between English noun phrases and their translations in Chinese, Japanese and French in parallel corpora. The result of this study is used in our system for translation of English noun phrases into these other languages from noisy parallel and non-parallel corpora

Columbia University Academic Commons