Search CORE

66 research outputs found

A Statistical Word-Level Translation Model for Comparable Corpora

Author: Diab Mona
Finch Steve
Publication venue
Publication date: 01/01/2000
Field of study

In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate, suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance. (Also cross-referenced as UMIACS-TR-2000-41, LAMP-TR-048

CiteSeerX

Digital Repository at the University of Maryland

Contrastive Approach towards Text Source Classification based on Top-Bag-Word Similarity

Author: Huang Chu-Ren
Lee Lung-Hao
Publication venue: De La Salle University - Dasmarinas
Publication date: 01/01/2008
Field of study

PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

Waseda University Repository

Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

Author: Ananiadou Sophia
Kontonatsios Georgios
Korkontzelos Yannis
Tsujii Jun'ichi
Publication venue
Publication date: 01/04/2014
Field of study

Edge Hill University Research Information Repository

Semi-Automatic Identification of Bilingual Synonymous Technical Terms from Phrase Tables and Parallel Patent Sentences

Author: Liang Bing
Utsuro Takehito
Yamamoto Mikio
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

METRICC: Harnessing Comparable Corpora for Multilingual Lexicon Development

Author: Alonso Araceli
Blancafort Helena
De Groc Clément
Million Chrystel
Williams Geoffrey
Publication venue: HAL CCSD
Publication date: 07/08/2012
Field of study

International audienceResearch on comparable corpora has grown in recent years bringing about the possibility of developing multilingual lexicons through the exploitation of comparable corpora to create corpus-driven multilingual dictionaries. To date, this issue has not been widely addressed. This paper focuses on the use of the mechanism of collocational networks proposed by Williams (1998) for exploiting comparable corpora. The paper first provides a description of the METRICC project, which is aimed at the automatically creation of comparable corpora and describes one of the crawlers developed for comparable corpora building, and then discusses the power of collocational networks for multilingual corpus-driven dictionary development

Hal - Université Grenoble Alpes

HAL-Université de Bretagne Occidentale