Unsupervised sense induction methods offer a solution to the

problem of scarcity of semantic resources. These methods

automatically extract semantic information from textual data

and create resources adapted to speciﬁc applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates

bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used

Apidianaki, Marianna

He, Yifan

English

DCU Online Research Access Service

and lexical selection in translation,”

Automatic generation of a coarse grained WordNet,”

BLEU: a Method for Automatic Evaluation of Machine Translation,”

Building a free French wordnet frommultilingualresources,”in Proceedingsof Ontolex,

Capturing lexical variationin MT evaluationusingautomaticallybuilt sense-cluster inventories,”

Discovering word senses from text,”

Distinguishing Systems and Distinguishing Senses: New Evaluation Methods for Word Sense Disambiguation,”

Distinguishing Word Senses in Untagged Text,”

Europarl: A Parallel Corpus for Statistical Machine Translation,”

Explorations in Automatic Thesaurus Discovery. Dordrecht :

Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric,”

Improving Statistical Machine Translation using Word Sense Disambiguation,”

Introduction to the special issue onevaluatingwordsense disambiguationsystems,”

Labelled Dependencies in Machine Translation Evaluation,”

Making Sense About Sense,”

METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments,”

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments,”

Probabilistic Part-of-Speech Tagging Using Decision Trees,”

Re-evaluating Machine Translation Results with Paraphrase Support,”

Robust Machine Translation Evaluation with Entailment Features,”inProceedingsofACL-IJCNLP,Suntec,Singapore,

Sense discrimination with parallel corpora,”in

Syntactic Constraints on Paraphrases Extracted from Parallel Corpora,”

TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate.”

Translation-oriented sense induction based on parallel corpora,”

utze, “Automatic Word Sense Discrimination,”

An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting

International audienceUnsupervised sense induction methods offer a solution to the problem of scarcity of semantic resources. These methods automatically extract semantic information from textual data and create resources adapted to specific applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a hand-crafted semantic resource is used

Apidianaki, Marianna, Ma

INRIA a CCSD electronic archive server

Unsupervised sense induction methods offer a solution to the\ud
problem of scarcity of semantic resources. These methods\ud
automatically extract semantic information from textual data\ud
and create resources adapted to speciﬁc applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates\ud
bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used

Name not available

Hal-Diderot

Unsupervised sense induction methods offer a solution to the
problem of scarcity of semantic resources. These methods
automatically extract semantic information from textual data
and create resources adapted to speciﬁc applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates
bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used

Irish Universities

http://doras.dcu.ie/16414/1/An_algorithm_for_cross-lingual_sense-clustering_tested_in_a_MT_evaluation_setting.pdf

An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting

Abstract

Similar works

Full text

Available Versions

DCU Online Research Access Service

INRIA a CCSD electronic archive server

Name not available

Hal-Diderot

Irish Universities