4,530 research outputs found
TermEval: an automatic metric for evaluating terminology translation in MT
Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain-knowledge from source to target is arguably the most concerning factor for the customers in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. However, evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem,
which could aid the end-users to instantly identify term translation problems in MT.
In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold standard evaluation test set, we semi-automatically create a gold-standard dataset from English--Hindi judicial domain parallel corpus.
We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations
Recommended from our members
Response Retrieval in Information-seeking Conversations
The increasing popularity of mobile Internet has led to several crucial changes in the way that people use search engines compared with traditional Web search on desktops. On one hand, there is limited output bandwidth with the small screen sizes of most mobile devices. Mobile Internet users prefer direct answers on the search engine result page (SERP). On the other hand, voice-based / text-based conversational interfaces are becoming increasing popular as shown in the wide adoption of intelligent assistant services and devices such as Amazon Echo, Microsoft Cortana and Google Assistant around the world. These important changes have triggered several new challenges that search engines have had to adapt to in order to better satisfy the information needs of mobile Internet users. In this dissertation, we investigate several aspects of single-turn answer retrieval and multi-turn information-seeking conversations to handle the new challenges of search on the mobile Internet.
We start from the research on single-turn answer retrieval and analyze the weaknesses of existing deep learning architectures for answer ranking. Then we propose an attention based neural matching model with a value-shared weighting scheme and attention mechanism to improve existing deep neural answer ranking models. Our proposed model achieves state-of-the-art performance for answer sentence retrieval compared with both feature engineering based methods and other neural models.
Then we move on to study response retrieval in multi-turn information-seeking conversations beyond single-turn interactions. Much research on response selection in conversation systems is modeling the matching patterns between user input message (either with context or not) and response candidates, which ignores external knowledge beyond the dialog utterances. We propose a learning framework on top of deep neural matching networks that leverages external knowledge with pseudo-relevance feedback and QA correspondence knowledge distillation for response retrieval. We also study how to integrate user intent modeling into neural ranking models to improve response retrieval performance. Finally, hybrid models of response retrieval and generation are investigated in order to combine the merits of these two different paradigms of conversation models.
Our goal is to develop effective learning models for answer retrieval and information-seeking conversations, in order to improve the effectiveness and user experience when accessing information with a touch screen interface or a conversational interface, as commonly adopted by millions of mobile Internet devices
Multilingual Name Entity Recognition and Intent Classification Employing Deep Learning Architectures
Named Entity Recognition and Intent Classification are among the most
important subfields of the field of Natural Language Processing. Recent
research has lead to the development of faster, more sophisticated and
efficient models to tackle the problems posed by those two tasks. In this work
we explore the effectiveness of two separate families of Deep Learning networks
for those tasks: Bidirectional Long Short-Term networks and Transformer-based
networks. The models were trained and tested on the ATIS benchmark dataset for
both English and Greek languages. The purpose of this paper is to present a
comparative study of the two groups of networks for both languages and showcase
the results of our experiments. The models, being the current state-of-the-art,
yielded impressive results and achieved high performance.Comment: 24 pages, 5 figures, 11 tables, dataset availabl
- …