Search CORE

929 research outputs found

What’s in a domain?:Towards fine-grained adaptation for machine translation

Author: van der Wees M.E.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Proceedings of the 17th Annual Conference of the European Association for Machine Translation

Author
Publication venue: Hrvatsko društvo za jezične tehnologije
Publication date: 01/01/2014
Field of study

Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Multilingual fine-grained sentiment text classification of web content for advertising technology applications

Author: ZAFALON KOVACS NICOLE
Publication venue
Publication date: 05/09/2023
Field of study

ope

Padua Thesis and Dissertation Archive

Border crossing and trespassing? : Expanding digital humanities research to developing peripheries with the novel digital technologies

Author: Hyyryläinen Torsti
Ryynänen Toni
Publication venue: University of Oulu
Publication date: 01/01/2019
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A machine learning-based approach to predicting success of questions on social question-answering

Author: Choi Erik
Kitzie Vanessa
Shah Chirag
Publication venue: iSchools
Publication date: 01/02/2013
Field of study

While social question-answering (SQA) services are becoming increasingly popular, there is often an issue of unsatisfactory or missing information for a question posed by an information seeker. This study creates a model to predict question failure, or a question that does not receive an answer, within the social Q&A site Yahoo! Answers. To do so, observed shared characteristics of failed questions were translated into empirical features, both textual and non-textual in nature, and measured using machine extraction methods. A classifier was then trained using these features and tested on a data set of 400 questions – half of them successful, half not – to determine the accuracy of the classifier in identifying failed questions. The results show the substantial ability of the approach to correctly identify the likelihood of success or failure of a question, resulting in a promising tool to automatically identify ill-formed questions and/or questions that are likely to fail and make suggestions on how to revise them.published or submitted for publicationis peer reviewe

Crossref

Illinois Digital Environment for Access to Learning and Scholarship Repository

Wanca in Korp : Text corpora for underresourced Uralic languages

Author: Jauhiainen Heidi
Jauhiainen Tommi
Linden Krister
Publication venue: University of Oulu
Publication date: 01/01/2019
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

The integration of machine translation and translation memory

Author: He Yifan
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2011
Field of study

We design and evaluate several models for integrating Machine Translation (MT) output into a Translation Memory (TM) environment to facilitate the adoption of MT technology in the localization industry. We begin with the integration on the segment level via translation recommendation and translation reranking. Given an input to be translated, our translation recommendation model compares the output from the MT and the TMsystems, and presents the better one to the post-editor. Our translation reranking model combines k-best lists from both systems, and generates a new list according to estimated post-editing effort. We perform both automatic and human evaluation on these models. When measured against the consensus of human judgement, the recommendation model obtains 0.91 precision at 0.93 recall, and the reranking model obtains 0.86 precision at 0.59 recall. The high precision of these models indicates that they can be integrated into TM environments without the risk of deteriorating the quality of the post-editing candidate, and can thereby preserve TM assets and established cost estimation methods associated with TMs. We then explore methods for a deeper integration of translation memory and machine translation on the sub-segment level. We predict whether phrase pairs derived from fuzzy matches could be used to constrain the translation of an input segment. Using a series of novel linguistically-motivated features, our constraints lead both to more consistent translation output, and to improved translation quality, reflected by a 1.2 improvement in BLEU score and a 0.72 reduction in TER score, both of statistical significance (p < 0.01). In sum, we present our work in three aspects: 1) translation recommendation and translation reranking models that can access high quality MT outputs in the TMenvironment, 2) a sub-segment translation memory and machine translation integration model that improves both translation consistency and translation quality, and 3) a human evaluation pipeline to validate the effectiveness of our models with human judgements

DCU Online Research Access Service

Training Machine Translation for Human Acceptability

Author: Song Xingyi
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/09/2016
Field of study

Discriminative training, a.k.a. tuning, is an important part of Statistical Machine Translation. This step optimises weights for the several statistical models and heuristics used in a machine translation system, in order to balance their relative effect on the translation output. Different weights lead to significant changes in the quality of translation outputs, and thus selecting appropriate weights is of key importance. This thesis addresses three major problems with current discriminative training methods in order to improve translation quality. First, we design more accurate automatic machine translation evaluation metrics that have better correlation with human judgements. An automatic evaluation metric is used in the loss function in most discriminative training methods, however what the best metric is for this purpose is still an open question. In this thesis we propose two novel evaluation metrics that achieve better correlation with human judgements than the current de facto standard, the BLEU metric. We show that these metrics can improve translation quality when used in discriminative training. Second, we design an algorithm to select sentence pairs for training the discriminative learner from large pools of freely available parallel sentences. These resources tend to be noisy and include translations of varying degrees of quality and suitability for the translation task at hand, especially if obtained using crowdsourcing methods. Nevertheless, they are crucial when professionally created training data is scarce or unavailable. There is very little previous research on the data selection for discriminative training. Our novel data selection algorithm does not require knowledge of the test set nor uses decoding outputs, and is thus more generally useful and efficient. Our experiments show that with this data selection algorithm, translation quality consistently improves over strong baselines. Finally, the third component of the thesis is a novel weighted ranking-based optimisation algorithm for discriminative training. In contrast to previous approaches, this technique assigns a different weight to each training instance according to its reachability and its relationship to test sentence being decoded, a form of transductive learning. Our experimental results show improvements over a modern state-of-the-art method across different language pairs. Overall, the proposed approaches lead to better translation quality when compared strong baselines in our experiments, both in isolation and when combined, and can be easily applied to most existing statistical machine translation approaches

White Rose E-theses Online

Argumentation Mining in User-Generated Web Discourse

Author: Gurevych Iryna
Habernal Ivan
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

arXiv.org e-Print Archive

TUbiblio

Crossref

Directory of Open Access Journals

TUdatalib Repository (TU Darmstadt)