6 research outputs found

    Investigating perception of spoken dialogue acceptability through surprisal

    Get PDF

    Quantifying the perceptual value of lexical and non-lexical channels in speech

    Full text link
    Speech is a fundamental means of communication that can be seen to provide two channels for transmitting information: the lexical channel of which words are said, and the non-lexical channel of how they are spoken. Both channels shape listener expectations of upcoming communication; however, directly quantifying their relative effect on expectations is challenging. Previous attempts require spoken variations of lexically-equivalent dialogue turns or conspicuous acoustic manipulations. This paper introduces a generalised paradigm to study the value of non-lexical information in dialogue across unconstrained lexical content. By quantifying the perceptual value of the non-lexical channel with both accuracy and entropy reduction, we show that non-lexical information produces a consistent effect on expectations of upcoming dialogue: even when it leads to poorer discriminative turn judgements than lexical content alone, it yields higher consensus among participants.Comment: To be published in Interspeech 2023, 5 pages, 1 figur

    AltGen: 1.3M Plausible Alternatives From Neural Text Generators

    No full text
    <h2>AltGen: 1.3M Plausible Alternatives From Neural Text Generators</h2><p>The AltGen dataset contains 1.3 million English texts generated by neural language generators conditioned on contexts from three corpora of acceptability judgements and two corpora of reading times. </p><p>For each corpus, each text generator, and each sampling algorithm,100 generations are sampled—for a total of 1,257,300 generations. Details about the language generators and the corpora are presented in a paper published at EMNLP 2023 (in particular, Section 4). Please cite this paper if you use any version of the dataset in your work:</p><blockquote><p>Mario Giulianelli, Sarenne Wallbridge, and Raquel Fernández. 2023. <strong>Information Value: Measuring Utterance Predictability as Distance from Plausible Alternatives</strong>. In <i>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</i>. Association for Computational Linguistics.</p></blockquote><p>The files are in jsonl format and include a <i>context_id</i> field, which allows retrieving the relevant entry from the original corpus, and the <i>alternatives</i> field, which contains the language model generations. Please note that the alternatives are not post-processed (see code and footnote 2 in the paper for further details). Filenames are built as follows: <i>DecodingAlgorithm</i>_<i>DecodingParameter</i>-n<i>NumAlternatives</i>-maxlen_<i>MaxGenerationLength</i>-sep_<i>Separator.</i>jsonl.</p&gt
    corecore