29 research outputs found
A Comparison of Concept-base Model and Word Distributed Model as Word Association System
AbstractWe construct Concept-base based on concept chain model and word vector spaces based on Word2Vec using EDR-electronic- dictionary and Japanese Wikipedia data. This paper describes verification experiments of these models regarding the word association system based on the association-frequency-table. In these experiments, we investigate the tendency using associative words of evaluation basis words obtained by these models. In Concept-base model, we observed a tendency that synonyms, superordinate words, and subordinate words are obtained as associative words. Furthermore we observed a tendency that words, which can be compounds or co-occurrence phrases after connecting headwords of the association-frequency-table, are used as associative words in the Word2Vec model. Moreover evaluation result showed the tendency that associative words mostly have category words in the Word2Vec model
Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources
Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen
Recommended from our members
Pattern Matching for Translating Domain-Specific Terms from Large Corpora
Translating domain-specific terms is one significant component of machine translation and Machine-aided translation systems. These terms are often not found in standard dictionaries. Human translators, not being experts in every technical or regional domain, cannot produce their translations effectively. Automatic translation of domain-specific terms is therefore highly desirable. Most other work on automatic term translation uses statistical information of words from parallel corpora. Parallel corpora of clean- translated texts are hard to come by whereas there are more noisy- translated texts and many more monolingual texts in various domains. We propose using noisy parallel texts and same-domain texts of a pair of languages to translate terms. In our work, we propose using a novel paradigm of pattern matching of statistical signals of word features. These features are robust to the syntactic structure, character sets, language of the text, and to the domain. We obtain statistical information which is related to the lexical properties of a word and its translation in any other language of the same domain. These lexical properties are extracted from the corpora and represented in vector form. We propose using signal processing techniques for matching these features vectors of a word to those of its translation. Another matching technique we propose is applying discriminative analysis of the word features. For each word, the various features are combined into a single vector which is then transformed into a smaller dimension eigenvector for matching. Since most domain specific terms are nouns and noun phrases, we concentrate on translating English nouns and noun phrases into other languages. We study the relationship between English noun phrases and their translations in Chinese, Japanese and French in parallel corpora. The result of this study is used in our system for translation of English noun phrases into these other languages from noisy parallel and non-parallel corpora
Improving the precision of example-based machine translation by learning from user feedback
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2007.Thesis (Master's) -- Bilkent University, 2007.Includes bibliographical references leaves 110-113Example-Based Machine Translation (EBMT) is a corpus based approach to Machine
Translation (MT), that utilizes the translation by analogy concept. In
our EBMT system, translation templates are extracted automatically from bilingual
aligned corpora, by substituting the similarities and differences in pairs of
translation examples with variables. As this process is done on the lexical-level
forms of the translation examples, and words in natural language texts are often
morphologically ambiguous, a need for morphological disambiguation arises.
Therefore, we present here a rule-based morphological disambiguator for Turkish.
In earlier versions of the discussed system, the translation results were solely
ranked using confidence factors of the translation templates. In this study, however,
we introduce an improved ranking mechanism that dynamically learns from
user feedback. When a user, such as a professional human translator, submits
his evaluation of the generated translation results, the system learns “contextdependent
co-occurrence rules” from this feedback. The newly learned rules are
later consulted, while ranking the results of the following translations. Through
successive translation-evaluation cycles, we expect that the output of the ranking
mechanism complies better with user expectations, listing the more preferred results
in higher ranks. The evaluation of our ranking method, using the precision
value at top 1, 3 and 5 results and the BLEU metric, is also presented.Daybelge, Turhan OsmanM.S
Can humain association norm evaluate latent semantic analysis?
This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations
Towards Multilingual Coreference Resolution
The current work investigates the problems that occur when coreference resolution is considered as a multilingual task. We assess the issues that arise when a framework using the mention-pair coreference resolution model and memory-based learning for the resolution process are used. Along the way, we revise three essential subtasks of coreference resolution: mention detection, mention head detection and feature selection. For each of these aspects we propose various multilingual solutions including both heuristic, rule-based and machine learning methods. We carry out a detailed analysis that includes eight different languages (Arabic, Catalan, Chinese, Dutch, English, German, Italian and Spanish) for which datasets were provided by the only two multilingual shared tasks on coreference resolution held so far: SemEval-2 and CoNLL-2012. Our investigation shows that, although complex, the coreference resolution task can be targeted in a multilingual and even language independent way. We proposed machine learning methods for each of the subtasks that are affected by the transition, evaluated and compared them to the performance of rule-based and heuristic approaches. Our results confirmed that machine learning provides the needed flexibility for the multilingual task and that the minimal requirement for a language independent system is a part-of-speech annotation layer provided for each of the approached languages. We also showed that the performance of the system can be improved by introducing other layers of linguistic annotations, such as syntactic parses (in the form of either constituency or dependency parses), named entity information, predicate argument structure, etc. Additionally, we discuss the problems occurring in the proposed approaches and suggest possibilities for their improvement
Representation and parsing of multiword expressions
This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches