13,890 research outputs found
Recommended from our members
Off-topic response detection for spontaneous spoken English assessment
Automatic spoken language assessment systems are becoming increasingly
important to meet the demand for English second language learning. This is a challenging task due to the high error rates of, even state-of-the-art, non-native speech recognition. Consequently current systems primarily assess fluency and pronunciation. However, content assessment is essential for full automation. As a first stage it is important to judge whether the speaker responds on topic to test questions designed to elicit spontaneous speech. Standard approaches to off-topic response detection assess similarity between the response and question based on bag-of-words representations. An alternative framework based on Recurrent Neural Network Language Models (RNNLM) is proposed in this paper. The RNNLM is adapted to
the topic of each test question. It learns to associate example responses to questions with points in a topic space constructed using these example responses. Classification is done by ranking the topic-conditional posterior probabilities of a response. The RNNLMs associate a broad range of responses with each topic, incorporate sequence information and scale better with additional training data, unlike standard methods. On experiments conducted on data from the Business Language Testing Service (BULATS) this approach outperforms standard approaches
An attention based model for off-topic spontaneous spoken response detection: An Initial Study
Automatic spoken language assessment systems are gaining popularity due to the rising demand for English second language learning. Current systems primarily assess fluency \
and pronunciation, rather than semantic content and relevance of a candidate's response to a prompt. However, to increase reliability and robustness, relevance assessment an\
d off-topic response detection are desirable, particularly for spontaneous spoken responses to open-ended prompts. Previously proposed approaches usually require prompt-resp\
onse pairs for all prompts. This limits flexibility as example responses are required whenever a new test prompt is introduced.
This paper presents a initial study of an attention based neural model which assesses the relevance of prompt-response pairs without the need to see them in training. This model uses a bidirectional Recurrent Neural Network (BiRNN) embedding of the prompt to compute attention over the hidden states of a BiRNN embedding of the response. The resulting fixed-length embedding is fed into a binary classifier to predict relevance of the response. Due to a lack of off-topic responses, negative examples for both training and evaluation are created by randomly shuffling prompts and responses. On spontaneous spoken data this system is able to assess relevance to both seen and unseen prompts
Gated Convolutional Bidirectional Attention-based Model for Off-topic Spoken Response Detection
Off-topic spoken response detection, the task aiming at predicting whether a
response is off-topic for the corresponding prompt, is important for an
automated speaking assessment system. In many real-world educational
applications, off-topic spoken response detectors are required to achieve high
recall for off-topic responses not only on seen prompts but also on prompts
that are unseen during training. In this paper, we propose a novel approach for
off-topic spoken response detection with high off-topic recall on both seen and
unseen prompts. We introduce a new model, Gated Convolutional Bidirectional
Attention-based Model (GCBiA), which applies bi-attention mechanism and
convolutions to extract topic words of prompts and key-phrases of responses,
and introduces gated unit and residual connections between major layers to
better represent the relevance of responses and prompts. Moreover, a new
negative sampling method is proposed to augment training data. Experiment
results demonstrate that our novel approach can achieve significant
improvements in detecting off-topic responses with extremely high on-topic
recall, for both seen and unseen prompts.Comment: ACL2020 long pape
Impact of ASR performance on free speaking language assessment
In free speaking tests candidates respond in spontaneous speech to prompts. This form of test allows the spoken language proficiency of a non-native speaker of English to be assessed more fully than read aloud tests. As the candidate's responses are unscripted, transcription by automatic speech recognition (ASR) is essential for automated assessment. ASR will never be 100% accurate so any assessment system must seek to minimise and mitigate ASR errors. This paper considers the impact of ASR errors on the performance of free speaking test auto-marking systems. Firstly rich linguistically related features, based on part-of-speech tags from statistical parse trees, are investigated for assessment. Then, the impact of ASR errors on how well the system can detect whether a learner's answer is relevant to the question asked is evaluated. Finally, the impact that these errors may have on the ability of the system to provide detailed feedback to the learner is analysed. In particular, pronunciation and grammatical errors are considered as these are important in helping a learner to make progress. As feedback resulting from an ASR error would be highly confusing, an approach to mitigate this problem using confidence scores is also analysed
Universal adversarial attacks on spoken language assessment systems
There is an increasing demand for automated spoken language assessment (SLA) systems, partly driven by the performance improvements that have come from deep learning based approaches. One aspect of deep learning systems is that they do not require expert derived features, operating directly on the original signal such as a speech recognition (ASR) transcript. This, however, increases their potential susceptibility to adversarial attacks as a form of candidate malpractice. In this paper the sensitivity of SLA systems to a universal black-box attack on the ASR text output is explored. The aim is to obtain a single, universal phrase to maximally increase any candidate's score. Four approaches to detect such adversarial attacks are also described. All the systems, and associated detection approaches, are evaluated on a free (spontaneous) speaking section from a Business English test. It is shown that on deep learning based SLA systems the average candidate score can be increased by almost one grade level using a single six word phrase appended to the end of the response hypothesis. Although these large gains can be obtained, they can be easily detected based on detection shifts from the scores of a “traditional” Gaussian Process based grader
Recommended from our members
Complementary systems for Off-Topic spoken response detection
Increased demand to learn English for business and education has led to growing interest in automatic spoken language assessment and teaching systems. With this shift to automated approaches it is important that systems reliably assess all aspects of a candidate's responses. This paper examines one form of spoken language assessment; whether the response from the candidate is relevant to the prompt provided. This will be referred to as off-topic spoken response detection. Two forms of previously proposed approaches are examined in this work: the hierarchical attention-based topic model (HATM); and the similarity grid model (SGM). The work focuses on the scenario when the prompt, and associated responses, have not been seen in the training data, enabling the system to be applied to new test scripts without the need to collect data or retrain the model. To improve the performance of the systems for unseen prompts, data augmentation based on easy data augmentation (EDA) and translation based approaches are applied. Additionally for the HATM, a form of prompt dropout is described. The systems were evaluated on both seen and unseen prompts from Linguaskill Business and General English tests. For unseen data the performance of the HATM was improved using data augmentation, in contrast to the SGM where no gains were obtained. The two approaches were found to be complementary to one another, yielding a combined F(0.5) score of 0.814
for off-topic response detection where the prompts have not been seen in training.ALT
Recommended from our members
A hierarchical attention based model for off-topic spontaneous spoken response detection
Automatic spoken language assessment and training systems are becoming increasingly popular to handle the growing demand to learn languages. However, current systems often assess only fluency and pronunciation, with limited content-based features being used. This paper examines one particular aspect of content-assessment, off-topic response detection. This is important for deployed systems as it ensures that candidates understood the prompt, and are able to generate an appropriate answer. Previously proposed approaches typically require a set of prompt-response training pairs, which lim- its flexibility as example responses are required whenever a new test prompt is introduced. Recently, the attention based neural topic model (ATM) was presented, which can assess the relevance of prompt-response pairs regardless of whether the prompt was seen in training. This model uses a bidirectional Recurrent Neural Network (BiRNN) embedding of the prompt combined with an attention mechanism to attend over the hidden states of a BiRNN embedding of the response to compute a fixed-length embedding used to predict relevance. Unfortunately, performance on prompts not seen in the training data is lower than on seen prompts.
Thus, this paper adds the following contributions: several im- provements to the ATM are examined; a hierarchical variant of the ATM (HATM) is proposed, which explicitly uses prompt similarity to further improve performance on unseen prompts by interpolating over prompts seen in training data given a prompt of interest via a second attention mechanism; an in-depth analysis of both models is conducted and main failure mode identified. On spontaneous spo- ken data, taken from BULATS tests, these systems are able to assess relevance to both seen and unseen prompt
- …