Search CORE

6,418 research outputs found

Generation and evaluation of user tailored responses in multimodal dialogue

Author: A. Stent
Bangalore
Bangalore
Bell
Brennan
Brennan
Brennan
Carenini
Cawsey
Chin
Clark
Cohen
Corbett
Edwards
G. Vasireddy
Garrod
J. Moore
Jameson
Johnston
Johnston
Johnston
Joshi
Joshi
Joshi
Keeney
Kempen
Klein
Larsson
Levelt
Levelt
Likert
Luchok
M. Johnston
M.A. Walker
Mann
Mayberry
McGuire
McKeown
McLemore
Mellish
Meteer
Miller
Mitchell
Mittal
Moore
Morik
Oviatt
P. Maloor
Paris
Prince
Rich
S.J. Whittaker
Schober
Sharp
Solomon
Srivastava
Wahlster
Walker
Walker
Webber
Whittaker
Wu
Zukerman
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2004
Field of study

Crossref

Edinburgh Research Explorer

Speech-plans: Generating evaluative responses in spoken dialogue

Author: Johnston M.
Maloor P.
Moore Johanna
Stent A.
Vasireddy G.
Walker M. A.
Whittaker S.
Publication venue
Publication date: 01/01/2002
Field of study

Recent work on evaluation of spoken dialogue systems indicates that better algorithms are needed for the presentation of complex information in speech. Current dialogue systems often rely on presenting sets of options and their attributes sequentially. This places a large memory burden on users, who have to remember complex trade-offs between multiple options and their attributes. To address these problems we build on previous work using multiattribute decision theory to devise speech-planning algorithms that present usertailored summaries, comparisons and recommendations that allow users to focus on critical differences between options and their attributes. We discuss the differences between speech and text planning that result from the particular demands of the speech situation.

CiteSeerX

Edinburgh Research Explorer

Individual and Domain Adaptation in Sentence Planning for Dialogue

Author: Mairesse F.
Prasad R.
Stent A.
Walker M. A.
Publication venue: 'AI Access Foundation'
Publication date: 31/10/2011
Field of study

One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a template-based generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations

arXiv.org e-Print Archive

Crossref

Towards responsive Sensitive Artificial Listeners

Author: Cowie Roddy
Heylen Dirk
Pantic Maja
Pelachaud Catherine
Schröder Marc
Schuller Björn
Publication venue: University of Sheffield
Publication date: 01/01/2008
Field of study

This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness

CiteSeerX

University of Twente Research Information

Fish or Fowl: A Wizard of Oz Evaluation of Dialogue Strategies in the Restaurant Domain

Author: Moore Johanna
Walker Marilyn
Whittaker Steve
Publication venue
Publication date: 01/01/2002
Field of study

Recent work on evaluation of spoken dialogue systems suggests that the information presentation phase of complex dialogues is often the primary contributor to dialogue duration. This indicates that better algorithms are needed for the presentation of complex information in speech. Currently however we lack data about the tasks and dialogue strategies on which to base such algorithms. In this paper, we describe a Wizard of Oz tool and a study which applies user models based on multi-attribute decision theory to the problem of generating tailored and concise system responses for a spoken dialogue system. The resulting Wizard corpus will be distributed by the LDC as part of our work on the ISLE project

CiteSeerX

Edinburgh Research Explorer

Stone Needle: A General Multimodal Large-scale Model Framework towards Healthcare

Author: Liu Weihua
Zuo Yong
Publication venue
Publication date: 28/06/2023
Field of study

In healthcare, multimodal data is prevalent and requires to be comprehensively analyzed before diagnostic decisions, including medical images, clinical reports, etc. However, current large-scale artificial intelligence models predominantly focus on single-modal cognitive abilities and neglect the integration of multiple modalities. Therefore, we propose Stone Needle, a general multimodal large-scale model framework tailored explicitly for healthcare applications. Stone Needle serves as a comprehensive medical multimodal model foundation, integrating various modalities such as text, images, videos, and audio to surpass the limitations of single-modal systems. Through the framework components of intent analysis, medical foundation models, prompt manager, and medical language module, our architecture can perform multi-modal interaction in multiple rounds of dialogue. Our method is a general multimodal large-scale model framework, integrating diverse modalities and allowing us to tailor for specific tasks. The experimental results demonstrate the superior performance of our method compared to single-modal systems. The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care

arXiv.org e-Print Archive

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Author: Collier Nigel
Huang Yupan
Liu Fangyu
Lu Yutong
Meng Zaiqiao
Su Yixuan
Publication venue
Publication date: 31/08/2023
Field of study

Large language models exhibit enhanced zero-shot performance on various tasks when fine-tuned with instruction-following data. Multimodal instruction-following models extend these capabilities by integrating both text and images. However, existing models such as MiniGPT-4 face challenges in maintaining dialogue coherence in scenarios involving multiple images. A primary reason is the lack of a specialized dataset for this critical application. To bridge these gaps, we present SparklesChat, a multimodal instruction-following model for open-ended dialogues across multiple images. To support the training, we introduce SparklesDialogue, the first machine-generated dialogue dataset tailored for word-level interleaved multi-image and text interactions. Furthermore, we construct SparklesEval, a GPT-assisted benchmark for quantitatively assessing a model's conversational competence across multiple images and dialogue turns. Our experiments validate the effectiveness of SparklesChat in understanding and reasoning across multiple images and dialogue turns. Specifically, SparklesChat outperformed MiniGPT-4 on established vision-and-language benchmarks, including the BISON binary image selection task and the NLVR2 visual reasoning task. Moreover, SparklesChat scored 8.56 out of 10 on SparklesEval, substantially exceeding MiniGPT-4's score of 3.91 and nearing GPT-4's score of 9.26. Qualitative evaluations further demonstrate SparklesChat's generality in handling real-world applications. All resources will be available at https://github.com/HYPJUDY/Sparkles

arXiv.org e-Print Archive