Improving generalisation to new speakers in spoken dialogue state tracking

Casanueva, I.; Green, P.; Hain, T.

research

Improving generalisation to new speakers in spoken dialogue state tracking

Authors: I. Casanueva
P. Green
T. Hain
Publication date: 8 September 2016
Publisher: 'International Speech Communication Association'
Doi

Abstract

Users with disabilities can greatly benefit from personalised voice-enabled environmental-control interfaces, but for users with speech impairments (e.g. dysarthria) poor ASR performance poses a challenge to successful dialogue. Statistical dialogue management has shown resilience against high ASR error rates, hence making it useful to improve the performance of these interfaces. However, little research was devoted to dialogue management personalisation to specific users so far. Recently, data driven discriminative models have been shown to yield the best performance in dialogue state tracking (the inference of the user goal from the dialogue history). However, due to the unique characteristics of each speaker, training a system for a new user when user specific data is not available can be challenging due to the mismatch between training and working conditions. This work investigates two methods to improve the performance with new speakers of a LSTM-based personalised state tracker: The use of speaker specific acoustic and ASRrelated features; and dropout regularisation. It is shown that in an environmental control system for dysarthric speakers, the combination of both techniques yields improvements of 3.5% absolute in state tracking accuracy. Further analysis explores the effect of using different amounts of speaker specific data to train the tracking system

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

White Rose Research Online

oai:eprints.whiterose.ac.uk:10...

Last time updated on 02/02/2017