1,389 research outputs found
Gated Convolutional Bidirectional Attention-based Model for Off-topic Spoken Response Detection
Off-topic spoken response detection, the task aiming at predicting whether a
response is off-topic for the corresponding prompt, is important for an
automated speaking assessment system. In many real-world educational
applications, off-topic spoken response detectors are required to achieve high
recall for off-topic responses not only on seen prompts but also on prompts
that are unseen during training. In this paper, we propose a novel approach for
off-topic spoken response detection with high off-topic recall on both seen and
unseen prompts. We introduce a new model, Gated Convolutional Bidirectional
Attention-based Model (GCBiA), which applies bi-attention mechanism and
convolutions to extract topic words of prompts and key-phrases of responses,
and introduces gated unit and residual connections between major layers to
better represent the relevance of responses and prompts. Moreover, a new
negative sampling method is proposed to augment training data. Experiment
results demonstrate that our novel approach can achieve significant
improvements in detecting off-topic responses with extremely high on-topic
recall, for both seen and unseen prompts.Comment: ACL2020 long pape
Recommended from our members
Complementary systems for Off-Topic spoken response detection
Increased demand to learn English for business and education has led to growing interest in automatic spoken language assessment and teaching systems. With this shift to automated approaches it is important that systems reliably assess all aspects of a candidate's responses. This paper examines one form of spoken language assessment; whether the response from the candidate is relevant to the prompt provided. This will be referred to as off-topic spoken response detection. Two forms of previously proposed approaches are examined in this work: the hierarchical attention-based topic model (HATM); and the similarity grid model (SGM). The work focuses on the scenario when the prompt, and associated responses, have not been seen in the training data, enabling the system to be applied to new test scripts without the need to collect data or retrain the model. To improve the performance of the systems for unseen prompts, data augmentation based on easy data augmentation (EDA) and translation based approaches are applied. Additionally for the HATM, a form of prompt dropout is described. The systems were evaluated on both seen and unseen prompts from Linguaskill Business and General English tests. For unseen data the performance of the HATM was improved using data augmentation, in contrast to the SGM where no gains were obtained. The two approaches were found to be complementary to one another, yielding a combined F(0.5) score of 0.814
for off-topic response detection where the prompts have not been seen in training.ALT
A retrieval-based dialogue system utilizing utterance and context embeddings
Finding semantically rich and computer-understandable representations for
textual dialogues, utterances and words is crucial for dialogue systems (or
conversational agents), as their performance mostly depends on understanding
the context of conversations. Recent research aims at finding distributed
vector representations (embeddings) for words, such that semantically similar
words are relatively close within the vector-space. Encoding the "meaning" of
text into vectors is a current trend, and text can range from words, phrases
and documents to actual human-to-human conversations. In recent research
approaches, responses have been generated utilizing a decoder architecture,
given the vector representation of the current conversation. In this paper, the
utilization of embeddings for answer retrieval is explored by using
Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor
(ANN) model, to find similar conversations in a corpus and rank possible
candidates. Experimental results on the well-known Ubuntu Corpus (in English)
and a customer service chat dataset (in Dutch) show that, in combination with a
candidate selection method, retrieval-based approaches outperform generative
ones and reveal promising future research directions towards the usability of
such a system.Comment: A shorter version is accepted at ICMLA2017 conference;
acknowledgement added; typos correcte
- …