3,014 research outputs found
Does BLEU Score Work for Code Migration?
Statistical machine translation (SMT) is a fast-growing sub-field of
computational linguistics. Until now, the most popular automatic metric to
measure the quality of SMT is BiLingual Evaluation Understudy (BLEU) score.
Lately, SMT along with the BLEU metric has been applied to a Software
Engineering task named code migration. (In)Validating the use of BLEU score
could advance the research and development of SMT-based code migration tools.
Unfortunately, there is no study to approve or disapprove the use of BLEU score
for source code. In this paper, we conducted an empirical study on BLEU score
to (in)validate its suitability for the code migration task due to its
inability to reflect the semantics of source code. In our work, we use human
judgment as the ground truth to measure the semantic correctness of the
migrated code. Our empirical study demonstrates that BLEU does not reflect
translation quality due to its weak correlation with the semantic correctness
of translated code. We provided counter-examples to show that BLEU is
ineffective in comparing the translation quality between SMT-based models. Due
to BLEU's ineffectiveness for code migration task, we propose an alternative
metric RUBY, which considers lexical, syntactical, and semantic representations
of source code. We verified that RUBY achieves a higher correlation coefficient
with the semantic correctness of migrated code, 0.775 in comparison with 0.583
of BLEU score. We also confirmed the effectiveness of RUBY in reflecting the
changes in translation quality of SMT-based translation models. With its
advantages, RUBY can be used to evaluate SMT-based code migration models.Comment: 12 pages, 5 figures, ICPC '19 Proceedings of the 27th International
Conference on Program Comprehensio
Temporal latent topic user profiles for search personalisation
The performance of search personalisation largely depends on how to build user profiles effectively. Many approaches have been developed to build user profiles using topics discussed in relevant documents, where the topics are usually obtained from human-generated online ontology such as Open Directory Project. The limitation of these approaches is that many documents may not contain the topics covered in the ontology. Moreover, the human-generated topics require expensive manual effort to determine the correct categories for each document. This paper addresses these problems by using Latent Dirichlet Allocation for unsupervised extraction of the topics from documents. With the learned topics, we observe that the search intent and user interests are dynamic, i.e., they change from time to time. In order to evaluate the effectiveness of temporal aspects in personalisation, we apply three typical time scales for building a long-term profile, a daily profile and a session profile. In the experiments, we utilise the profiles to re-rank search results returned by a commercial web search engine. Our experimental results demonstrate that our temporal profiles can significantly improve the ranking quality. The results further show a promising effect of temporal features in correlation with click entropy and query position in a search session
- …