1,703 research outputs found

    Bridging SMT and TM with translation recommendation

    Get PDF
    We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding exact matches. futhermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels

    Integrating N-best SMT outputs into a TM system

    Get PDF
    In this paper, we propose a novel frame- work to enrich Translation Memory (TM) systems with Statistical Machine Translation (SMT) outputs using ranking. In order to offer the human translators multiple choices, instead of only using the top SMT output and top TM hit, we merge the N-best output from the SMT system and the k-best hits with highest fuzzy match scores from the TM system. The merged list is then ranked according to the prospective post-editing effort and provided to the translators to aid their work. Experiments show that our ranked output achieve 0.8747 precision at top 1 and 0.8134 precision at top 5. Our framework facilitates a tight integration between SMT and TM, where full advantage is taken of TM while high quality SMT output is availed of to improve the productivity of human translators

    Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

    Get PDF
    In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

    Sentence-level quality estimation for MT system combination

    Get PDF
    This paper provides the system description of the Dublin City University system combination module for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimize the Division of Labour in Hybrid MT (ML4HMT- 12). We incorporated a sentence-level quality score, obtained by sentence-level Quality Estimation (QE), as meta information guiding system combination. Instead of using BLEU or (minimum average) TER, we select a backbone for the confusion network using the estimated quality score. For the Spanish-English data, our strategy improved 0.89 BLEU points absolute compared to the best single score and 0.20 BLEU points absolute compared to the standard system combination strateg

    Discourse Structure in Machine Translation Evaluation

    Full text link
    In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse analysis. Computational Linguistics, 201

    Statistical inference for time-varying ARCH processes

    Full text link
    In this paper the class of ARCH()(\infty) models is generalized to the nonstationary class of ARCH()(\infty) models with time-varying coefficients. For fixed time points, a stationary approximation is given leading to the notation ``locally stationary ARCH()(\infty) process.'' The asymptotic properties of weighted quasi-likelihood estimators of time-varying ARCH(p)(p) processes (p<p<\infty) are studied, including asymptotic normality. In particular, the extra bias due to nonstationarity of the process is investigated. Moreover, a Taylor expansion of the nonstationary ARCH process in terms of stationary processes is given and it is proved that the time-varying ARCH process can be written as a time-varying Volterra series.Comment: Published at http://dx.doi.org/10.1214/009053606000000227 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Single-mindedness theory: empirical evidence from the U.K.

    Get PDF
    In this paper I will exploit answers coming from the British Election Study in order to assess the validity of the Single Mindedness Theory. In particular, I will evaluate whether political preferences of voters for political candidates depend on their age and some other characteristics such as gender, education, religion, social and economic conditions. Performing LOGIT and PROBIT regression I will demonstrate that variable age is statistically significant, demonstrating that Single Mindedness Theory assumptions hold in the UK political environment.Single-mindedness; political survey; electorate preferences; Logit; Probit
    corecore