233 research outputs found
Who's afraid of job interviews? Definitely a question for user modelling
We define job interviews as a domain of interaction that can be modelled automatically in a serious game for job interview skills training. We present four types of studies: (1) field-based human-to-human job interviews, (2) field-based computer-mediated human-to-human interviews, (3) lab-based wizard of oz studies, (4) field-based human-to agent studies. Together, these highlight pertinent questions for the user modelling eld as it expands its scope to applications for social inclusion. The results of the studies show that the interviewees suppress their emotional behaviours and although our system recognises automatically a subset of those behaviours, the modelling of complex mental states in real-world contexts poses a challenge for the state-of-the-art user modelling technologies. This calls for the need to re-examine both the approach to the implementation of the models and/or of their usage for the target contexts
Recommended from our members
Data-Driven Policy Optimisation for Multi-Domain Task-Oriented Dialogue
Recent developments in machine learning along with a general shift in the public attitude towards digital personal assistants has opened new frontiers for conversational systems. Nevertheless, building data-driven multi-domain conversational agents that act optimally given a dialogue context is an open challenge. The first step towards that goal is developing an efficient way of learning a dialogue policy in new domains. Secondly, it is important to have the ability to collect and utilise human-human conversational data to bootstrap an agent's knowledge. The work presented in this thesis demonstrates how a neural dialogue manager fine-tuned with reinforcement learning presents a viable approach for learning a dialogue policy efficiently and across many domains.
The thesis starts by introducing a dialogue management module that learns through interactions to act optimally given a current context of a conversation. The current shift towards neural, parameter-rich systems does not fully address the problem of error noise coming from speech recognition or natural language understanding components. A Bayesian approach is therefore proposed to learn more robust and effective policy management in direct interactions without any prior data. By putting a distribution over model weights, the learning agent is less prone to overfit to particular dialogue realizations and a more efficient exploration policy can be therefore employed. The results show that deep reinforcement learning performs on par with non-parametric models even in a low data regime while significantly reducing the computational complexity compared with the previous state-of-the-art.
The deployment of a dialogue manager without any pre-training on human conversations is not a viable option from an industry perspective. However, the progress in building statistical systems, particularly dialogue managers, is hindered by the scale of data available. To address this fundamental obstacle, a novel data-collection pipeline entirely based on crowdsourcing without the need for hiring professional annotators is introduced. The validation of the approach results in the collection of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully labeled collection of human-human written conversations spanning over multiple domains and topics. The proposed dataset creates a set of new benchmarks (belief tracking, policy optimisation, and response generation) significantly raising the complexity of analysed dialogues.
The collected dataset serves as a foundation for a novel reinforcement learning (RL)-based approach for training a multi-domain dialogue manager. A Multi-Action and Slot Dialogue Agent (MASDA) is proposed to combat some limitations: 1) handling complex multi-domain dialogues with multiple concurrent actions present in a single turn; and 2) lack of interpretability, which consequently impedes the use of intermediate signals (e.g., dialogue turn annotations) if such signals are available. MASDA explicitly models system acts and slots using intermediate signals, resulting in an improved task-based end-to-end framework. The model can also select concurrent actions in a single turn, thus enriching the representation of the generated responses. The proposed framework allows for RL training of dialogue task completion metrics when dealing with concurrent actions. The results demonstrate the advantages of both 1) handling concurrent actions and 2) exploiting intermediate signals: MASDA outperforms previous end-to-end frameworks while also offering improved scalability.EPSR
Speech analysis for Ambient Assisted Living : technical and user design of a vocal order system
International audienceEvolution of ICT led to the emergence of smart home. A Smart Home consists in a home equipped with data-processing technology which anticipates the needs of its inhabitant while trying to maintain their comfort and their safety by action on the house and by implementing connections with the outside world. Therefore, smart homes equipped with ambient intelligence technology constitute a promising direction to enable the growing number of elderly to continue to live in their own homes as long as possible. However, the technological solutions requested by this part of the population have to suit their specific needs and capabilities. It is obvious that these Smart Houses tend to be equipped with devices whose interfaces are increasingly complex and become difficult to control by the user. The people the most likely to benefit from these new technologies are the people in loss of autonomy such as the disabled people or the elderly which cognitive deficiencies (Alzheimer). Moreover, these people are the less capable of using the complex interfaces due to their handicap or their lack ICT understanding. Thus, it becomes essential to facilitate the daily life and the access to the whole home automation system through the smart home. The usual tactile interfaces should be supplemented by accessible interfaces, in particular, thanks to a system reactive to the voice ; these interfaces are also useful when the person cannot move easily. Vocal orders will allow the following functionality: - To ensure an assistance by a traditional or vocal order. - To set up a indirect order regulation for a better energy management. - To reinforce the link with the relatives by the integration of interfaces dedicated and adapted to the person in loss of autonomy. - To ensure more safety by detection of distress situations and when someone is breaking in the house. This chapter will describe the different steps which are needed for the conception of an audio ambient system. The first step is related to the acceptability and the objection aspects by the end users and we will report a user evaluation assessing the acceptance and the fear of this new technology. The experience aimed at testing three important aspects of speech interaction: voice command, communication with the outside world, home automation system interrupting a person's activity. The experiment was conducted in a smart home with a voice command using a Wizard of OZ technique and gave information of great interest. The second step is related to a general presentation of the audio sensing technology for ambient assisted living. Different aspect of sound and speech processing will be developed. The applications and challenges will be presented. The third step is related to speech recognition in the home environment. Automatic Speech Recognition systems (ASR) have reached good performances with close talking microphones (e.g., head-set), but the performances decrease significantly as soon as the microphone is moved away from the mouth of the speaker (e.g., when the microphone is set in the ceiling). This deterioration is due to a broad variety of effects including reverberation and presence of undetermined background noise such as TV radio and, devices. This part will present a system of vocal order recognition in distant speech context. This system was evaluated in a dedicated flat thanks to some experiments. This chapter will then conclude with a discussion on the interest of the speech modality concerning the Ambient Assisted Living
Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora
International audienceThe PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems' capabilities in terms of robustness and portability across languages and domains. A new test set with some adaptation data is prepared for each case: in Italian as an example of a new language, for ticket reservation as an example of a new domain. Finally the work is complemented by the proposition of a new high level semantic annotation scheme well-suited to dialogue data
Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study
Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex coaching conversations. We also develop an efficient integration procedure of the whole system that will act as an intelligent and robust Virtual Coach. The coaching task significantly differs from the classical applications of SDSs, resulting in a much higher degree of complexity and difficulty. The Virtual Coach has been successfully tested and validated in a user study with independent elderly, in three different countries with three different languages and cultures: Spain, France and Norway.The research presented in this paper has been conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant No. 769872. Additionally, this work has been partially funded by projects BEWORD and AMIC-PC of the Minister of Science of Technology, under Grant Nos. PID2021-126061OB-C42 and PDC2021-120846-C43, respectively. Vázquez and López Zorrilla received a PhD scholarship from the Basque Government, with Grant Nos. PRE 2020 1 0274 and PRE 2017 1 0357, respectively
- …