38 research outputs found
Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation
In the present study, we propose novel sequence-to-sequence pre-training objectives for low-resource machine translation (NMT): Japanese-specific sequence to sequence (JASS) for language pairs involving Japanese as the source or target language, and English-specific sequence to sequence (ENSS) for language pairs involving English. JASS focuses on masking and reordering Japanese linguistic units known as bunsetsu, whereas ENSS is proposed based on phrase structure masking and reordering tasks. Experiments on ASPEC Japanese–English & Japanese–Chinese, Wikipedia Japanese–Chinese, News English–Korean corpora demonstrate that JASS and ENSS outperform MASS and other existing language-agnostic pre-training methods by up to +2.9 BLEU points for the Japanese–English tasks, up to +7.0 BLEU points for the Japanese–Chinese tasks and up to +1.3 BLEU points for English–Korean tasks. Empirical analysis, which focuses on the relationship between individual parts in JASS and ENSS, reveals the complementary nature of the subtasks of JASS and ENSS. Adequacy evaluation using LASER, human evaluation, and case studies reveals that our proposed methods significantly outperform pre-training methods without injected linguistic knowledge and they have a larger positive impact on the adequacy as compared to the fluency
Controlling Styles in Neural Machine Translation with Activation Prompt
Controlling styles in neural machine translation (NMT) has attracted wide
attention, as it is crucial for enhancing user experience. Earlier studies on
this topic typically concentrate on regulating the level of formality and
achieve some progress in this area. However, they still encounter two major
challenges. The first is the difficulty in style evaluation. The style
comprises various aspects such as lexis, syntax, and others that provide
abundant information. Nevertheless, only formality has been thoroughly
investigated. The second challenge involves excessive dependence on incremental
adjustments, particularly when new styles are necessary. To address both
challenges, this paper presents a new benchmark and approach. A multiway
stylized machine translation (MSMT) benchmark is introduced, incorporating
diverse categories of styles across four linguistic domains. Then, we propose a
method named style activation prompt (StyleAP) by retrieving prompts from
stylized monolingual corpus, which does not require extra fine-tuning.
Experiments show that StyleAP could effectively control the style of
translation and achieve remarkable performance.Comment: Accepted by Findings of ACL 2023; The code is available at
https://github.com/IvanWang0730/StyleA
Uncertainty in Natural Language Generation: From Theory to Applications
Recent advances of powerful Language Models have allowed Natural Language
Generation (NLG) to emerge as an important technology that can not only perform
traditional tasks like summarisation or translation, but also serve as a
natural language interface to a variety of applications. As such, it is crucial
that NLG systems are trustworthy and reliable, for example by indicating when
they are likely to be wrong; and supporting multiple views, backgrounds and
writing styles -- reflecting diverse human sub-populations. In this paper, we
argue that a principled treatment of uncertainty can assist in creating systems
and evaluation protocols better aligned with these goals. We first present the
fundamental theory, frameworks and vocabulary required to represent
uncertainty. We then characterise the main sources of uncertainty in NLG from a
linguistic perspective, and propose a two-dimensional taxonomy that is more
informative and faithful than the popular aleatoric/epistemic dichotomy.
Finally, we move from theory to applications and highlight exciting research
directions that exploit uncertainty to power decoding, controllable generation,
self-assessment, selective answering, active learning and more
Empirical Analysis of Multilingual NLP Models : Context Window for Word Embeddings and Linguistic Typology for Machine Translation
学位の種別: 修士University of Tokyo(東京大学
Revisiting Low Resource Status of Indian Languages in Machine Translation
Indian language machine translation performance is hampered due to the lack
of large scale multi-lingual sentence aligned corpora and robust benchmarks.
Through this paper, we provide and analyse an automated framework to obtain
such a corpus for Indian language neural machine translation (NMT) systems. Our
pipeline consists of a baseline NMT system, a retrieval module, and an
alignment module that is used to work with publicly available websites such as
press releases by the government. The main contribution towards this effort is
to obtain an incremental method that uses the above pipeline to iteratively
improve the size of the corpus as well as improve each of the components of our
system. Through our work, we also evaluate the design choices such as the
choice of pivoting language and the effect of iterative incremental increase in
corpus size. Our work in addition to providing an automated framework also
results in generating a relatively larger corpus as compared to existing
corpora that are available for Indian languages. This corpus helps us obtain
substantially improved results on the publicly available WAT evaluation
benchmark and other standard evaluation benchmarks.Comment: 10 pages, few figures, Preprint under revie
Entity centric neural models for natural language processing
This thesis explores how to enhance natural language understanding by incorporating entity information into neural network models. It tackles three key questions:1. Leveraging entities for understanding tasks: This work introduces Entity-GCN, a model that performs multi-step reasoning on a graph where nodes represent entity mentions and edges represent relationships. This method achieved state-of-the-art results on a multi-document question-answering dataset.2. Identifying and disambiguating entities using large language models: This research proposes a novel system that retrieves entities by generating their names token-by-token, overcoming limitations of traditional methods and significantly reducing memory footprint. This approach is also extended to a multilingual setting and further optimized for efficiency.3. Interpreting and controlling entity knowledge within models: This thesis presents a post-hoc interpretation technique to analyze how decisions are made across layers in neural models, allowing for visualization and analysis of knowledge representation. Additionally, a method for editing factual knowledge about entities is proposed, enabling correction of model predictions without costly retraining