Search CORE

29 research outputs found

A Comparison of Concept-base Model and Word Distributed Model as Word Association System

Author: Okumura Noriyuki
Toyoshima Akihiro
Publication venue: Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractWe construct Concept-base based on concept chain model and word vector spaces based on Word2Vec using EDR-electronic- dictionary and Japanese Wikipedia data. This paper describes verification experiments of these models regarding the word association system based on the association-frequency-table. In these experiments, we investigate the tendency using associative words of evaluation basis words obtained by these models. In Concept-base model, we observed a tendency that synonyms, superordinate words, and subordinate words are obtained as associative words. Furthermore we observed a tendency that words, which can be compounds or co-occurrence phrases after connecting headwords of the association-frequency-table, are used as associative words in the Word2Vec model. Moreover evaluation result showed the tendency that associative words mostly have category words in the Word2Vec model

Elsevier - Publisher Connector

Integrated Information Access Technology for Digital Libraries: Access across Languages, Periods, and Cultures

Author: Akira Maeda
Biligsaikhan Batjargal
Fuminori Kimura
Garmaabazar Khaltarkhuu
Publication venue: 'IntechOpen'
Publication date: 04/04/2011
Field of study

IntechOpen

Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

Author: EHRMANN MAUD
TURCHI MARCO
Publication venue: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico
Publication date: 09/08/2011
Field of study

Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Recommended from our members

Pattern Matching for Translating Domain-Specific Terms from Large Corpora

Author: Fung Pascale
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1995
Field of study

Translating domain-specific terms is one significant component of machine translation and Machine-aided translation systems. These terms are often not found in standard dictionaries. Human translators, not being experts in every technical or regional domain, cannot produce their translations effectively. Automatic translation of domain-specific terms is therefore highly desirable. Most other work on automatic term translation uses statistical information of words from parallel corpora. Parallel corpora of clean- translated texts are hard to come by whereas there are more noisy- translated texts and many more monolingual texts in various domains. We propose using noisy parallel texts and same-domain texts of a pair of languages to translate terms. In our work, we propose using a novel paradigm of pattern matching of statistical signals of word features. These features are robust to the syntactic structure, character sets, language of the text, and to the domain. We obtain statistical information which is related to the lexical properties of a word and its translation in any other language of the same domain. These lexical properties are extracted from the corpora and represented in vector form. We propose using signal processing techniques for matching these features vectors of a word to those of its translation. Another matching technique we propose is applying discriminative analysis of the word features. For each word, the various features are combined into a single vector which is then transformed into a smaller dimension eigenvector for matching. Since most domain specific terms are nouns and noun phrases, we concentrate on translating English nouns and noun phrases into other languages. We study the relationship between English noun phrases and their translations in Chinese, Japanese and French in parallel corpora. The result of this study is used in our system for translation of English noun phrases into these other languages from noisy parallel and non-parallel corpora

Columbia University Academic Commons

The possibility of predicting learning performance using features of note taking activities and instructions in a blended learning environment

Author: A. Piolat
A. W. T Bates
B. S. Bloom
C. Bell
D. T. Seaton
D. T. Seaton
J. T. M. Gulikers
K. A. Kiewra
K. A. Kiewra
K. Kobayashi
Kiewra K. A.
L. J. Cronbach
L. R. Goldberg
M. Nakayama
M. Nakayama
M. Nakayama
M. Nakayama
M. Nakayama
M. Nakayama
M. Nakayama
M. Nakayama
M. Ueno
P. A. Nye
P. Weener
R. E Bennett
Y. Fujii
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Improving the precision of example-based machine translation by learning from user feedback

Author: Daybelge Turhan Osman
Publication venue: Bilkent University
Publication date: 01/01/2007
Field of study

Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2007.Thesis (Master's) -- Bilkent University, 2007.Includes bibliographical references leaves 110-113Example-Based Machine Translation (EBMT) is a corpus based approach to Machine Translation (MT), that utilizes the translation by analogy concept. In our EBMT system, translation templates are extracted automatically from bilingual aligned corpora, by substituting the similarities and differences in pairs of translation examples with variables. As this process is done on the lexical-level forms of the translation examples, and words in natural language texts are often morphologically ambiguous, a need for morphological disambiguation arises. Therefore, we present here a rule-based morphological disambiguator for Turkish. In earlier versions of the discussed system, the translation results were solely ranked using confidence factors of the translation templates. In this study, however, we introduce an improved ranking mechanism that dynamically learns from user feedback. When a user, such as a professional human translator, submits his evaluation of the generated translation results, the system learns “contextdependent co-occurrence rules” from this feedback. The newly learned rules are later consulted, while ranking the results of the following translations. Through successive translation-evaluation cycles, we expect that the output of the ranking mechanism complies better with user expectations, listing the more preferred results in higher ranks. The evaluation of our ranking method, using the precision value at top 1, 3 and 5 results and the BLEU metric, is also presented.Daybelge, Turhan OsmanM.S

Bilkent University Institutional Repository

Can humain association norm evaluate latent semantic analysis?

Author: Gatkowska Izabela
Korzycki Michał
Lubaszewski Wiesław
Publication venue: [s.n.]
Publication date: 01/01/2013
Field of study

This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

Jagiellonian Univeristy Repository

Towards Multilingual Coreference Resolution

Author: Zhekova Desislava
Publication venue
Publication date: 01/01/2013
Field of study

The current work investigates the problems that occur when coreference resolution is considered as a multilingual task. We assess the issues that arise when a framework using the mention-pair coreference resolution model and memory-based learning for the resolution process are used. Along the way, we revise three essential subtasks of coreference resolution: mention detection, mention head detection and feature selection. For each of these aspects we propose various multilingual solutions including both heuristic, rule-based and machine learning methods. We carry out a detailed analysis that includes eight different languages (Arabic, Catalan, Chinese, Dutch, English, German, Italian and Spanish) for which datasets were provided by the only two multilingual shared tasks on coreference resolution held so far: SemEval-2 and CoNLL-2012. Our investigation shows that, although complex, the coreference resolution task can be targeted in a multilingual and even language independent way. We proposed machine learning methods for each of the subtasks that are affected by the transition, evaluated and compared them to the performance of rule-based and heuristic approaches. Our results confirmed that machine learning provides the needed flexibility for the multilingual task and that the minimal requirement for a language independent system is a part-of-speech annotation layer provided for each of the approached languages. We also showed that the performance of the system can be improved by introducing other layers of linguistic annotations, such as syntactic parses (in the form of either constituency or dependency parses), named entity information, predicate argument structure, etc. Additionally, we discuss the problems occurring in the proposed approaches and suggest possibilities for their improvement

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen

Representation and parsing of multiword expressions

Author
Publication venue: Language Science Press
Publication date: 01/04/2020
Field of study

This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches

Directory of Open Access Books (DOAB)

I. Magyar Számítógépes Nyelvészeti Konferencia

Author
Publication venue
Publication date: 01/01/2003
Field of study

University of Szeged