28 research outputs found
Quality and Quantity of Machine Translation References for Automatic Metrics
Automatic machine translation metrics typically rely on human translations to
determine the quality of system translations. Common wisdom in the field
dictates that the human references should be of very high quality. However,
there are no cost-benefit analyses that could be used to guide practitioners
who plan to collect references for machine translation evaluation. We find that
higher-quality references lead to better metric correlations with humans at the
segment-level. Having up to 7 references per segment and taking their average
(or maximum) helps all metrics. Interestingly, the references from vendors of
different qualities can be mixed together and improve metric success. Higher
quality references, however, cost more to create and we frame this as an
optimization problem: given a specific budget, what references should be
collected to maximize metric success. These findings can be used by evaluators
of shared tasks when references need to be created under a certain budget
Enhancing Textbooks with Visuals from the Web for Improved Learning
Textbooks are the primary vehicle for delivering quality education to
students. It has been shown that explanatory or illustrative visuals play a key
role in the retention, comprehension and the general transfer of knowledge.
However, many textbooks, especially in the developing world, are low quality
and lack interesting visuals to support student learning. In this paper, we
investigate the effectiveness of vision-language models to automatically
enhance textbooks with images from the web. Specifically, we collect a dataset
of e-textbooks from one of the largest free online publishers in the world. We
rigorously analyse the dataset, and use the resulting analysis to motivate a
task that involves retrieving and appropriately assigning web images to
textbooks, which we frame as a novel optimization problem. Through a
crowd-sourced evaluation, we verify that (1) while the original textbook images
are rated higher, automatically assigned ones are not far behind, and (2) the
choice of the optimization problem matters. We release the dataset of textbooks
with an associated image bank to spur further research in this area.Comment: 17 pages, 27 figure
Revisiting Automated Topic Model Evaluation with Large Language Models
Topic models are used to make sense of large text collections. However,
automatically evaluating topic model output and determining the optimal number
of topics both have been longstanding challenges, with no effective automated
solutions to date. This paper proposes using large language models to evaluate
such output. We find that large language models appropriately assess the
resulting topics, correlating more strongly with human judgments than existing
automated metrics. We then investigate whether we can use large language models
to automatically determine the optimal number of topics. We automatically
assign labels to documents and choosing configurations with the most pure
labels returns reasonable values for the optimal number of topics
Tokenization and the Noiseless Channel
Subword tokenization is a key part of many NLP pipelines. However, little is
known about why some tokenizer and hyperparameter combinations lead to better
downstream model performance than others. We propose that good tokenizers lead
to \emph{efficient} channel usage, where the channel is the means by which some
input is conveyed to the model and efficiency can be quantified in
information-theoretic terms as the ratio of the Shannon entropy to the maximum
possible entropy of the token distribution. Yet, an optimal encoding according
to Shannon entropy assigns extremely long codes to low-frequency tokens and
very short codes to high-frequency tokens. Defining efficiency in terms of
R\'enyi entropy, on the other hand, penalizes distributions with either very
high or very low-frequency tokens. In machine translation, we find that across
multiple tokenizers, the R\'enyi entropy with has a very strong
correlation with \textsc{Bleu}: in comparison to just for
compressed length.Comment: ACL 202
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate
Word embeddings that map words into a fixed-dimensional vector space are the
backbone of modern NLP. Most word embedding methods encode semantic
information. However, phonetic information, which is important for some tasks,
is often overlooked. In this work, we develop several novel methods which
leverage articulatory features to build phonetically informed word embeddings,
and present a set of phonetic word embeddings to encourage their community
development, evaluation and use. While several methods for learning phonetic
word embeddings already exist, there is a lack of consistency in evaluating
their effectiveness. Thus, we also proposes several ways to evaluate both
intrinsic aspects of phonetic word embeddings, such as word retrieval and
correlation with sound similarity, and extrinsic performances, such as rhyme
and cognate detection and sound analogies. We hope that our suite of tasks will
promote reproducibility and provide direction for future research on phonetic
word embeddings
A Decade of Scholarly Research on Open Knowledge Graphs
The proliferation of open knowledge graphs has led to a surge in scholarly
research on the topic over the past decade. This paper presents a bibliometric
analysis of the scholarly literature on open knowledge graphs published between
2013 and 2023. The study aims to identify the trends, patterns, and impact of
research in this field, as well as the key topics and research questions that
have emerged. The work uses bibliometric techniques to analyze a sample of 4445
scholarly articles retrieved from Scopus. The findings reveal an
ever-increasing number of publications on open knowledge graphs published every
year, particularly in developed countries (+50 per year). These outputs are
published in highly-referred scholarly journals and conferences. The study
identifies three main research themes: (1) knowledge graph construction and
enrichment, (2) evaluation and reuse, and (3) fusion of knowledge graphs into
NLP systems. Within these themes, the study identifies specific tasks that have
received considerable attention, including entity linking, knowledge graph
embedding, and graph neural networks
Backtranslation feedback improves user confidence in MT, not quality
This is an accepted manuscript of an article published by ACL in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 21). in June 2021.
The accepted version of the publication may differ from the final published version.Translating text into a language unknown to the textâs author, dubbed outbound translation, is a modern need for which the user experience has significant room for improvement, beyond the basic machine translation facility. We demonstrate this by showing three ways in which user confidence in the outbound translation, as well as its overall final quality, can be affected: backward translation, quality estimation (with alignment) and source paraphrasing. In this paper, we describe an experiment
on outbound translation from English to Czech and Estonian. We examine the effects of each proposed feedback module and further focus on how the quality of machine translation systems influence these findings and the user perception of success. We show that backward translation feedback has a mixed effect on the whole process: it increases user confidence in the produced translation, but not the objective quality
Enabling Outbound Machine Translation
It is not uncommon for Internet users to have to produce text in a foreign language they have very little knowledge of and are unable to verify the translation quality. We call the task "outbound translation" and explore it by introducing an open-source mod- ular system PtakopÄt. Its main purpose is to inspect human interaction with machine translation systems enhanced by additional subsystems, such as backward translation and quality estimation. We follow up with an experiment on (Czech) human annotators tasked to produce questions in a language they do not speak (German), with the help of PtakopÄt. We focus on three real-world use cases (communication with IT support, describing administrative issues and asking encyclopedic questions) from which we gain insight into different strategies users take when faced with outbound translation tasks. Round trip translation is known to be unreliable for evaluating MT systems but our ex- perimental evaluation documents that it works very well for users, at least on MT systems of mid-range quality.