473,634 research outputs found
Generating varied narrative probability exercises
This paper presents Genpex, a system for automatic generation of narrative probability exercises. Generation of exercises in Genpex is done in two steps. First, the system creates a specification of a solvable probability problem, based on input from the user (a researcher or test developer) who selects a specific question type and a narrative context for the problem. Then, a text expressing the probability problem is generated. The user can tune the generated text by setting the values of some linguistic variation parameters. By varying the mathematical content of the exercise, its narrative context and the linguistic parameter settings, many different exercises can be produced. Here we focus on the natural language generation part of Genpex. After describing how the system works, we briefly present our first evaluation results, and discuss some aspects requiring further investigation
Supporting Stylized Language Models Using Multi-Modality Features
As AI and machine learning systems become more common in our everyday lives, there is an increased desire to construct systems that are able to seamlessly interact and communicate with humans. This typically means creating systems that are able to communicate with humans via natural language. Given the variance of natural language, this can be a very challenging task. In this thesis, I explored the topic of humanlike language generation in the context of stylized language generation. Stylized language generation involves producing some text that exhibits a specific, desired style. In this dissertation, I specifically explored the use of multi-modality features as a means to provide sufficient information to produce high-quality stylized text output. I also explored how these multi-modality features can be used to identify and explain errors in the generated output. Finally, I constructed an automated language evaluation metric that can evaluate stylized language models
Multi-Dimensional Evaluation of Text Summarization with In-Context Learning
Evaluation of natural language generation (NLG) is complex and
multi-dimensional. Generated text can be evaluated for fluency, coherence,
factuality, or any other dimensions of interest. Most frameworks that perform
such multi-dimensional evaluation require training on large manually or
synthetically generated datasets. In this paper, we study the efficacy of large
language models as multi-dimensional evaluators using in-context learning,
obviating the need for large training datasets. Our experiments show that
in-context learning-based evaluators are competitive with learned evaluation
frameworks for the task of text summarization, establishing state-of-the-art on
dimensions such as relevance and factual consistency. We then analyze the
effects of factors such as the selection and number of in-context examples on
performance. Finally, we study the efficacy of in-context learning based
evaluators in evaluating zero-shot summaries written by large language models
such as GPT-3.Comment: ACL Findings '2
Spatial Natural Language Generation for Location Description in Photo Captions
We present a spatial natural language generation system to create captions that describe the geographical context of geo-referenced photos. An analysis of existing photo captions was used to design templates representing typical caption language patterns, while the results of human subject experiments were used to create field-based spatial models of the applicability of some commonly used spatial prepositions. The language templates are instantiated with geo-data retrieved from the vicinity of the photo locations. A human subject evaluation was used to validate and to improve the spatial language generation procedure, examples of the results of which are presented in the paper
Context-dependent Instruction Tuning for Dialogue Response Generation
Recent language models have achieved impressive performance in natural
language tasks by incorporating instructions with task input during
fine-tuning. Since all samples in the same natural language task can be
explained with the same task instructions, many instruction datasets only
provide a few instructions for the entire task, without considering the input
of each example in the task. However, this approach becomes ineffective in
complex multi-turn dialogue generation tasks, where the input varies highly
with each turn as the dialogue context changes, so that simple task
instructions cannot improve the generation performance. To address this
limitation, we introduce a context-based instruction fine-tuning framework for
each multi-turn dialogue which generates both responses and instructions based
on the previous context as input. During the evaluation, the model generates
instructions based on the previous context to self-guide the response. The
proposed framework produces comparable or even outstanding results compared to
the baselines by aligning instructions to the input during fine-tuning with the
instructions in quantitative evaluations on dialogue benchmark datasets with
reduced computation budget.Comment: Work in Progres
Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking
The natural language generation (NLG) component of a spoken dialogue system
(SDS) usually needs a substantial amount of handcrafting or a well-labeled
dataset to be trained on. These limitations add significantly to development
costs and make cross-domain, multi-lingual dialogue systems intractable.
Moreover, human languages are context-aware. The most natural response should
be directly learned from data rather than depending on predefined syntaxes or
rules. This paper presents a statistical language generator based on a joint
recurrent and convolutional neural network structure which can be trained on
dialogue act-utterance pairs without any semantic alignments or predefined
grammar trees. Objective metrics suggest that this new model outperforms
previous methods under the same experimental conditions. Results of an
evaluation by human judges indicate that it produces not only high quality but
linguistically varied utterances which are preferred compared to n-gram and
rule-based systems.Comment: To be appear in SigDial 201
- …