15 research outputs found
Neural Machine Translation for Malayalam Paraphrase Generation
This study explores four methods of generating paraphrases in Malayalam,
utilizing resources available for English paraphrasing and pre-trained Neural
Machine Translation (NMT) models. We evaluate the resulting paraphrases using
both automated metrics, such as BLEU, METEOR, and cosine similarity, as well as
human annotation. Our findings suggest that automated evaluation measures may
not be fully appropriate for Malayalam, as they do not consistently align with
human judgment. This discrepancy underscores the need for more nuanced
paraphrase evaluation approaches especially for highly agglutinative languages
Automatic Answerability Evaluation for Question Generation
Conventional automatic evaluation metrics, such as BLEU and ROUGE, developed
for natural language generation (NLG) tasks, are based on measuring the n-gram
overlap between the generated and reference text. These simple metrics may be
insufficient for more complex tasks, such as question generation (QG), which
requires generating questions that are answerable by the reference answers.
Developing a more sophisticated automatic evaluation metric, thus, remains as
an urgent problem in QG research. This work proposes a Prompting-based Metric
on ANswerability (PMAN), a novel automatic evaluation metric to assess whether
the generated questions are answerable by the reference answers for the QG
tasks. Extensive experiments demonstrate that its evaluation results are
reliable and align with human evaluations. We further apply our metric to
evaluate the performance of QG models, which shows our metric complements
conventional metrics. Our implementation of a ChatGPT-based QG model achieves
state-of-the-art (SOTA) performance in generating answerable questions
Towards Robust Text Retrieval with Progressive Learning
Retrieval augmentation has become an effective solution to empower large
language models (LLMs) with external and verified knowledge sources from the
database, which overcomes the limitations and hallucinations of LLMs in
handling up-to-date and domain-specific information. However, existing
embedding models for text retrieval usually have three non-negligible
limitations. First, the number and diversity of samples in a batch are too
restricted to supervise the modeling of textual nuances at scale. Second, the
high proportional noise are detrimental to the semantic correctness and
consistency of embeddings. Third, the equal treatment to easy and difficult
samples would cause sub-optimum convergence of embeddings with poorer
generalization. In this paper, we propose the PEG, a progressively learned
embeddings for robust text retrieval. Specifically, we increase the training
in-batch negative samples to 80,000, and for each query, we extracted five hard
negatives. Concurrently, we incorporated a progressive learning mechanism,
enabling the model to dynamically modulate its attention to the samples
throughout the entire training process. Additionally, PEG is trained on more
than 100 million data, encompassing a wide range of domains (e.g., finance,
medicine, and tourism) and covering various tasks (e.g., question-answering,
machine reading comprehension, and similarity matching). Extensive experiments
conducted on C-MTEB and DuReader demonstrate that PEG surpasses
state-of-the-art embeddings in retrieving true positives, highlighting its
significant potential for applications in LLMs. Our model is publicly available
at https://huggingface.co/TownsWu/PEG
Neural Conversation Generation with Auxiliary Emotional Supervised Models
An important aspect of developing dialogue agents involves endowing a conversation system with emotion perception and interaction. Most existing emotion dialogue models lack the adaptability and extensibility of different scenes because of their limitation to require a specified emotion category or their reliance on a fixed emotional dictionary. To overcome these limitations, we propose a neural conversation generation with auxiliary emotional supervised model (nCG-ESM) comprising a sequence-to-sequence (Seq2Seq) generation model and an emotional classifier used as an auxiliary model. The emotional classifier was trained to predict the emotion distributions of the dialogues, which were then used as emotion supervised signals to guide the generation model to generate diverse emotional responses. The proposed nCG-ESM is flexible enough to generate responses with emotional diversity, including specified or unspecified emotions, which can be adapted and extended to different scenarios. We conducted extensive experiments on the popular dataset of Weibo post--response pairs. Experimental results showed that the proposed model was capable of producing more diverse, appropriate, and emotionally rich responses, yielding substantial gains in diversity scores and human evaluations.Peer reviewe
Task-Oriented Conversation Generation Using Heterogeneous Memory Networks
How to incorporate external knowledge into a neural dialogue model is
critically important for dialogue systems to behave like real humans. To handle
this problem, memory networks are usually a great choice and a promising way.
However, existing memory networks do not perform well when leveraging
heterogeneous information from different sources. In this paper, we propose a
novel and versatile external memory networks called Heterogeneous Memory
Networks (HMNs), to simultaneously utilize user utterances, dialogue history
and background knowledge tuples. In our method, historical sequential dialogues
are encoded and stored into the context-aware memory enhanced by gating
mechanism while grounding knowledge tuples are encoded and stored into the
context-free memory. During decoding, the decoder augmented with HMNs
recurrently selects each word in one response utterance from these two memories
and a general vocabulary. Experimental results on multiple real-world datasets
show that HMNs significantly outperform the state-of-the-art data-driven
task-oriented dialogue models in most domains.Comment: Accepted as a long paper at EMNLP-IJCNLP 201
Zero-Shot Fact-Checking with Semantic Triples and Knowledge Graphs
Despite progress in automated fact-checking, most systems require a
significant amount of labeled training data, which is expensive. In this paper,
we propose a novel zero-shot method, which instead of operating directly on the
claim and evidence sentences, decomposes them into semantic triples augmented
using external knowledge graphs, and uses large language models trained for
natural language inference. This allows it to generalize to adversarial
datasets and domains that supervised models require specific training data for.
Our empirical results show that our approach outperforms previous zero-shot
approaches on FEVER, FEVER-Symmetric, FEVER 2.0, and Climate-FEVER, while being
comparable or better than supervised models on the adversarial and the
out-of-domain datasets
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable
performance in long-term human-machine interactions, which basically relies on
iterative recalling and reasoning of history to generate high-quality
responses. However, such repeated recall-reason steps easily produce biased
thoughts, \textit{i.e.}, inconsistent reasoning results when recalling the same
history for different questions. On the contrary, humans can keep thoughts in
the memory and recall them without repeated reasoning. Motivated by this human
capability, we propose a novel memory mechanism called TiM (Think-in-Memory)
that enables LLMs to maintain an evolved memory for storing historical thoughts
along the conversation stream. The TiM framework consists of two crucial
stages: (1) before generating a response, a LLM agent recalls relevant thoughts
from memory, and (2) after generating a response, the LLM agent post-thinks and
incorporates both historical and new thoughts to update the memory. Thus, TiM
can eliminate the issue of repeated reasoning by saving the post-thinking
thoughts as the history. Besides, we formulate the basic principles to organize
the thoughts in memory based on the well-established operations,
(\textit{i.e.}, insert, forget, and merge operations), allowing for dynamic
updates and evolution of the thoughts. Furthermore, we introduce
Locality-Sensitive Hashing into TiM to achieve efficient retrieval for the
long-term conversations. We conduct qualitative and quantitative experiments on
real-world and simulated dialogues covering a wide range of topics,
demonstrating that equipping existing LLMs with TiM significantly enhances
their performance in generating responses for long-term interactions