4 research outputs found
Automatic Neural Question Generation using Community-based Question Answering Systems
In this thesis, we address the problem of opinion question generation. The motivation behind this task is to provide users with more question samples related to their query when using search engines. In our view, one of the datasets that is closest to peoples’ thoughts, informal and casual speech are Community Question Answering (CQA) forums, where one can post questions, and other users can answer them. Specifically, we perform experiments on the Amazon question/answer dataset.
Unlike the conventional approaches that have tackled the question generation problem with hand-crafted rules, our approach is entirely data-driven. We model our problem with the sequence to sequence approach using an encoder-decoder structure, which has shown significant improvement in different natural language processing research areas in recent years. Our model benefits from the attention mechanism, which assists the model in fo- cusing on a specific part of the input sentence. Furthermore, we provide solutions to the following problems: repetition of words and generating outside of vocabulary tokens. We provide a detailed explanation of the performance of the system. Experimental results show an improvement in automatic evaluation metrics such as the BLEU score over the state-of- the-art question generation system
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
Powerful large language models have facilitated the development of writing
assistants that promise to significantly improve the quality and efficiency of
composition and communication. However, a barrier to effective assistance is
the lack of personalization in LLM outputs to the author's communication style
and specialized knowledge. In this paper, we address this challenge by
proposing PEARL, a retrieval-augmented LLM writing assistant personalized with
a generation-calibrated retriever. Our retriever is trained to select historic
user-authored documents for prompt augmentation, such that they are likely to
best personalize LLM generations for a user request. We propose two key
novelties for training our retriever: 1) A training data selection method that
identifies user requests likely to benefit from personalization and documents
that provide that benefit; and 2) A scale-calibrating KL-divergence objective
that ensures that our retriever closely tracks the benefit of a document for
personalized generation. We demonstrate the effectiveness of PEARL in
generating personalized workplace social media posts and Reddit comments.
Finally, we showcase the potential of a generation-calibrated retriever to
double as a performance predictor and further improve low-quality generations
via LLM chaining.Comment: Pre-print, work in progres