Search CORE

67 research outputs found

Gated Convolutional Bidirectional Attention-based Model for Off-topic Spoken Response Detection

Author: Li Ruobing
Lin Hui
Zha Yefei
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Off-topic spoken response detection, the task aiming at predicting whether a response is off-topic for the corresponding prompt, is important for an automated speaking assessment system. In many real-world educational applications, off-topic spoken response detectors are required to achieve high recall for off-topic responses not only on seen prompts but also on prompts that are unseen during training. In this paper, we propose a novel approach for off-topic spoken response detection with high off-topic recall on both seen and unseen prompts. We introduce a new model, Gated Convolutional Bidirectional Attention-based Model (GCBiA), which applies bi-attention mechanism and convolutions to extract topic words of prompts and key-phrases of responses, and introduces gated unit and residual connections between major layers to better represent the relevance of responses and prompts. Moreover, a new negative sampling method is proposed to augment training data. Experiment results demonstrate that our novel approach can achieve significant improvements in detecting off-topic responses with extremely high on-topic recall, for both seen and unseen prompts.Comment: ACL2020 long pape

arXiv.org e-Print Archive

Crossref

Incorporating uncertainty into deep learning for spoken language assessment

Author: Gales MJF
Knill KM
Malinin A
Ragni A
Publication venue: ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Publication date: 01/01/2017
Field of study

There is a growing demand for automatic assessment of spoken English proficiency. These systems need to handle large vari- ations in input data owing to the wide range of candidate skill levels and L1s, and errors from ASR. Some candidates will be a poor match to the training data set, undermining the validity of the predicted grade. For high stakes tests it is essen- tial for such systems not only to grade well, but also to provide a measure of their uncertainty in their predictions, en- abling rejection to human graders. Pre- vious work examined Gaussian Process (GP) graders which, though successful, do not scale well with large data sets. Deep Neural Networks (DNN) may also be used to provide uncertainty using Monte-Carlo Dropout (MCD). This paper proposes a novel method to yield uncertainty and compares it to GPs and DNNs with MCD. The proposed approach explicitly teaches a DNN to have low uncertainty on train- ing data and high uncertainty on generated artificial data. On experiments conducted on data from the Business Language Test- ing Service (BULATS), the proposed ap- proach is found to outperform GPs and DNNs with MCD in uncertainty-based re- jection whilst achieving comparable grad- ing performance

Crossref

Apollo (Cambridge)

White Rose Research Online

Impact of ASR performance on free speaking language assessment

Author: Caines AP
Gales MJF
Knill KM
Kyriakopoulos K
Malinin A
Ragni A
Wang Y
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2018
Field of study

In free speaking tests candidates respond in spontaneous speech to prompts. This form of test allows the spoken language proficiency of a non-native speaker of English to be assessed more fully than read aloud tests. As the candidate's responses are unscripted, transcription by automatic speech recognition (ASR) is essential for automated assessment. ASR will never be 100% accurate so any assessment system must seek to minimise and mitigate ASR errors. This paper considers the impact of ASR errors on the performance of free speaking test auto-marking systems. Firstly rich linguistically related features, based on part-of-speech tags from statistical parse trees, are investigated for assessment. Then, the impact of ASR errors on how well the system can detect whether a learner's answer is relevant to the question asked is evaluated. Finally, the impact that these errors may have on the ability of the system to provide detailed feedback to the learner is analysed. In particular, pronunciation and grammatical errors are considered as these are important in helping a learner to make progress. As feedback resulting from an ASR error would be highly confusing, an approach to mitigate this problem using confidence scores is also analysed

Crossref

Apollo (Cambridge)

White Rose Research Online

An attention based model for off-topic spontaneous spoken response detection: An Initial Study

Author: Gales Mark
Knill Kate
Malinin Andrey
Ragni Anton
Wang Yu
Publication venue: http://www.isca-speech.org/archive/SLaTE_2017/
Publication date: 01/08/2017
Field of study

Automatic spoken language assessment systems are gaining popularity due to the rising demand for English second language learning. Current systems primarily assess fluency \ and pronunciation, rather than semantic content and relevance of a candidate's response to a prompt. However, to increase reliability and robustness, relevance assessment an\ d off-topic response detection are desirable, particularly for spontaneous spoken responses to open-ended prompts. Previously proposed approaches usually require prompt-resp\ onse pairs for all prompts. This limits flexibility as example responses are required whenever a new test prompt is introduced. This paper presents a initial study of an attention based neural model which assesses the relevance of prompt-response pairs without the need to see them in training. This model uses a bidirectional Recurrent Neural Network (BiRNN) embedding of the prompt to compute attention over the hidden states of a BiRNN embedding of the response. The resulting fixed-length embedding is fed into a binary classifier to predict relevance of the response. Due to a lack of off-topic responses, negative examples for both training and evaluation are created by randomly shuffling prompts and responses. On spontaneous spoken data this system is able to assess relevance to both seen and unseen prompts

Apollo (Cambridge)

White Rose Research Online

Recommended from our members

Off-topic response detection for spontaneous spoken English assessment

Author: Gales MJF
Knill KM
Malinin A
Van Dalen RC
Wang Y
Publication venue: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
Publication date: 01/01/2016
Field of study

Automatic spoken language assessment systems are becoming increasingly important to meet the demand for English second language learning. This is a challenging task due to the high error rates of, even state-of-the-art, non-native speech recognition. Consequently current systems primarily assess fluency and pronunciation. However, content assessment is essential for full automation. As a first stage it is important to judge whether the speaker responds on topic to test questions designed to elicit spontaneous speech. Standard approaches to off-topic response detection assess similarity between the response and question based on bag-of-words representations. An alternative framework based on Recurrent Neural Network Language Models (RNNLM) is proposed in this paper. The RNNLM is adapted to the topic of each test question. It learns to associate example responses to questions with points in a topic space constructed using these example responses. Classification is done by ranking the topic-conditional posterior probabilities of a response. The RNNLMs associate a broad range of responses with each topic, incorporate sequence information and scale better with additional training data, unlike standard methods. On experiments conducted on data from the Business Language Testing Service (BULATS) this approach outperforms standard approaches

Apollo (Cambridge)

Recommended from our members

Complementary systems for Off-Topic spoken response detection

Author: Gales MJF
Knill K
Raina V
Publication venue: Proceedings of the Annual Meeting of the Association for Computational Linguistics
Publication date: 01/01/2020
Field of study

Increased demand to learn English for business and education has led to growing interest in automatic spoken language assessment and teaching systems. With this shift to automated approaches it is important that systems reliably assess all aspects of a candidate's responses. This paper examines one form of spoken language assessment; whether the response from the candidate is relevant to the prompt provided. This will be referred to as off-topic spoken response detection. Two forms of previously proposed approaches are examined in this work: the hierarchical attention-based topic model (HATM); and the similarity grid model (SGM). The work focuses on the scenario when the prompt, and associated responses, have not been seen in the training data, enabling the system to be applied to new test scripts without the need to collect data or retrain the model. To improve the performance of the systems for unseen prompts, data augmentation based on easy data augmentation (EDA) and translation based approaches are applied. Additionally for the HATM, a form of prompt dropout is described. The systems were evaluated on both seen and unseen prompts from Linguaskill Business and General English tests. For unseen data the performance of the HATM was improved using data augmentation, in contrast to the SGM where no gains were obtained. The two approaches were found to be complementary to one another, yielding a combined F(0.5) score of 0.814 for off-topic response detection where the prompts have not been seen in training.ALT

Apollo (Cambridge)

Automatic Essay Scoring Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Author: Chen Changyou
Kumar Yaman
Li Junyi Jessy
Parekh Swapnil
Shah Rajiv Ratn
Singh Somesh
Publication venue: University of Illinois at Chicago Library
Publication date: 08/04/2023
Field of study

Deep-learning based Automatic Essay Scoring (AES) systems are being actively used in various high-stake applications in education and testing. However, little research has been put to understand and interpret the black-box nature of deep-learning-based scoring algorithms. While previous studies indicate that scoring models can be easily fooled, in this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity (i.e., large change in output score with a little change in input essay content) and overstability (i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as “end-to-end” models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. The presence of a few words with high co-occurrence with a certain score class makes the model associate the essay sample with that score. This causes score changes in ∼95% of samples with an addition of only a few words. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and samples causing overstability with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully

University of Illinois at Chicago: Journals@UIC

Early word order and animacy

Author: Cannizzaro Courtney Leigh
Publication venue: s.n.
Publication date: 01/01/2012
Field of study

ARTS repository - University of Groningen