Search CORE

1,703 research outputs found

Bridging SMT and TM with translation recommendation

Author: He Yifan
Ma Yanjun
van Genabith Josef
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 11/07/2010
Field of study

We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding exact matches. futhermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels

Integrating N-best SMT outputs into a TM system

Author: He Yifan
Ma Yanjun
van Genabith Josef
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

In this paper, we propose a novel frame- work to enrich Translation Memory (TM) systems with Statistical Machine Translation (SMT) outputs using ranking. In order to offer the human translators multiple choices, instead of only using the top SMT output and top TM hit, we merge the N-best output from the SMT system and the k-best hits with highest fuzzy match scores from the TM system. The merged list is then ranked according to the prospective post-editing effort and provided to the translators to aid their work. Experiments show that our ranked output achieve 0.8747 precision at top 1 and 0.8134 precision at top 5. Our framework facilitates a tight integration between SMT and TM, where full advantage is taken of TM while high quality SMT output is availed of to improve the productivity of human translators

Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Lefever Els
Van de Kauter Marjan
Van Hee Cynthia
Publication venue
Publication date: 01/01/2017
Field of study

In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

Sentence-level quality estimation for MT system combination

Author: Okita Tsuyoshi
Rubino Raphael
van Genabith Josef
Publication venue
Publication date: 09/12/2012
Field of study

This paper provides the system description of the Dublin City University system combination module for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimize the Division of Labour in Hybrid MT (ML4HMT- 12). We incorporated a sentence-level quality score, obtained by sentence-level Quality Estimation (QE), as meta information guiding system combination. Instead of using BLEU or (minimum average) TER, we select a backbone for the confusion network using the estimated quality score. For the Spanish-English data, our strategy improved 0.89 BLEU points absolute compared to the best single score and 0.20 BLEU points absolute compared to the standard system combination strateg

CiteSeerX

Discourse Structure in Machine Translation Evaluation

Author: Guzmán Francisco
Joty Shafiq
Màrquez Lluís
Nakov Preslav
Publication venue
Publication date: 01/01/2017
Field of study

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse analysis. Computational Linguistics, 201

arXiv.org e-Print Archive

Directory of Open Access Journals

Statistical inference for time-varying ARCH processes

Author: Dahlhaus Rainer
Rao Suhasini Subba
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

In this paper the class of ARCH

(\infty)

models is generalized to the nonstationary class of ARCH

(\infty)

models with time-varying coefficients. For fixed time points, a stationary approximation is given leading to the notation ``locally stationary ARCH

(\infty)

process.'' The asymptotic properties of weighted quasi-likelihood estimators of time-varying ARCH

(p)

processes (

p<\infty

) are studied, including asymptotic normality. In particular, the extra bias due to nonstationarity of the process is investigated. Moreover, a Taylor expansion of the nonstationary ARCH process in terms of stationary processes is given and it is proved that the time-varying ARCH process can be written as a time-varying Volterra series.Comment: Published at http://dx.doi.org/10.1214/009053606000000227 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

The Single-mindedness theory: empirical evidence from the U.K.

Author: Emanuele Canegrati
Publication venue
Publication date
Field of study

In this paper I will exploit answers coming from the British Election Study in order to assess the validity of the Single Mindedness Theory. In particular, I will evaluate whether political preferences of voters for political candidates depend on their age and some other characteristics such as gender, education, religion, social and economic conditions. Performing LOGIT and PROBIT regression I will demonstrate that variable age is statistically significant, demonstrating that Single Mindedness Theory assumptions hold in the UK political environment.Single-mindedness; political survey; electorate preferences; Logit; Probit