Search CORE

48,504 research outputs found

Incorporating uncertainty into deep learning for spoken language assessment

Author: Gales MJF
Knill KM
Malinin A
Ragni A
Publication venue: ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Publication date: 01/01/2017
Field of study

There is a growing demand for automatic assessment of spoken English proficiency. These systems need to handle large vari- ations in input data owing to the wide range of candidate skill levels and L1s, and errors from ASR. Some candidates will be a poor match to the training data set, undermining the validity of the predicted grade. For high stakes tests it is essen- tial for such systems not only to grade well, but also to provide a measure of their uncertainty in their predictions, en- abling rejection to human graders. Pre- vious work examined Gaussian Process (GP) graders which, though successful, do not scale well with large data sets. Deep Neural Networks (DNN) may also be used to provide uncertainty using Monte-Carlo Dropout (MCD). This paper proposes a novel method to yield uncertainty and compares it to GPs and DNNs with MCD. The proposed approach explicitly teaches a DNN to have low uncertainty on train- ing data and high uncertainty on generated artificial data. On experiments conducted on data from the Business Language Test- ing Service (BULATS), the proposed ap- proach is found to outperform GPs and DNNs with MCD in uncertainty-based re- jection whilst achieving comparable grad- ing performance

Crossref

Apollo (Cambridge)

White Rose Research Online

Ensemble approaches for uncertainty in spoken language assessment

Author: Gales MJF
Knill KM
Malinin A
Wu X
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2020
Field of study

Deep learning has dramatically improved the performance of automated systems on a range of tasks including spoken language assessment. One of the issues with these deep learning approaches is that they tend to be overconfident in the decisions that they make, with potentially serious implications for deployment of systems for high-stakes examinations. This paper examines the use of ensemble approaches to improve both the reliability of the scores that are generated, and the ability to detect where the system has made predictions beyond acceptable errors. In this work assessment is treated as a regression problem. Deep density networks, and ensembles of these models, are used as the predictive models. Given an ensemble of models measures of uncertainty, for example the variance of the predicted distributions, can be obtained and used for detecting outlier predictions. However, these ensemble approaches increase the computational and memory requirements of the system. To address this problem the ensemble is distilled into a single mixture density network. The performance of the systems is evaluated on a free speaking prompt-response style spoken language assessment test. Experiments show that the ensembles and the distilled model yield performance gains over a single model, and have the ability to detect outliers

Crossref

Apollo (Cambridge)

Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

Author: Sharma Shikhar
Asri Layla El
Schulz Hannes
Zumer Jeremie
Publication venue
Publication date: 01/01/1915
Field of study

Automated metrics such as BLEU are widely used in the machine translation literature. They have also been used recently in the dialogue community for evaluating dialogue response generation. However, previous work in dialogue response generation has shown that these metrics do not correlate strongly with human judgment in the non task-oriented dialogue setting. Task-oriented dialogue responses are expressed on narrower domains and exhibit lower diversity. It is thus reasonable to think that these automated metrics would correlate well with human judgment in the task-oriented setting where the generation task consists of translating dialogue acts into a sentence. We conduct an empirical study to confirm whether this is the case. Our findings indicate that these automated metrics have stronger correlation with human judgments in the task-oriented setting compared to what has been observed in the non task-oriented setting. We also observe that these metrics correlate even better for datasets which provide multiple ground truth reference sentences. In addition, we show that some of the currently available corpora for task-oriented language generation can be solved with simple models and advocate for more challenging datasets

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Connected Learning Journeys in Music Production Education

Author: Hepworth-Sawyer R.
Hepworth-Sawyer R.
Toulson R.
Toulson R.
Publication venue: 'American Association on Intellectual and Developmental Disabilities (AAIDD)'
Publication date: 01/01/2019
Field of study

The field of music production education is a challenging one, exploring multiple creative, technical and entrepreneurial disciplines, including music composition, performance electronics, acoustics, musicology, project management and psychology. As a result, students take multiple ‘learning journeys’ on their pathway towards becoming autonomous learners. This paper uniquely evaluates the journey of climbing Bloom’s cognitive domain in the field of music production and gives specific examples that validate teaching music production in higher education through multiple, connected ascents of the framework. Owing to the practical nature of music production, Kolb’s Experiential Learning Model is also considered as a recurring function that is necessary for climbing Bloom’s domain, in order to ensure that learners are equipped for employability and entrepreneurship on graduation. The authors’ own experiences of higher education course delivery, design and development are also reflected upon with reference to Music Production pathways at both the University of Westminster (London, UK) and York St John University (York, UK)

WestminsterResearch

Universal adversarial attacks on spoken language assessment systems

Author: Gales MJF
Knill K
Raina V
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2020
Field of study

There is an increasing demand for automated spoken language assessment (SLA) systems, partly driven by the performance improvements that have come from deep learning based approaches. One aspect of deep learning systems is that they do not require expert derived features, operating directly on the original signal such as a speech recognition (ASR) transcript. This, however, increases their potential susceptibility to adversarial attacks as a form of candidate malpractice. In this paper the sensitivity of SLA systems to a universal black-box attack on the ASR text output is explored. The aim is to obtain a single, universal phrase to maximally increase any candidate's score. Four approaches to detect such adversarial attacks are also described. All the systems, and associated detection approaches, are evaluated on a free (spontaneous) speaking section from a Business English test. It is shown that on deep learning based SLA systems the average candidate score can be increased by almost one grade level using a single six word phrase appended to the end of the response hypothesis. Although these large gains can be obtained, they can be easily detected based on detection shifts from the scores of a “traditional” Gaussian Process based grader

Crossref

Apollo (Cambridge)

Recommended from our members

Towards automatic assessment of spontaneous spoken English

Author: Gales MJF
Knill KM
Kyriakopoulos K
Malinin A
Rashid M
van Dalen RC
Wang Y
Publication venue: Speech Communication
Publication date: 01/01/2018
Field of study

With increasing global demand for learning English as a second language, there has been considerable interest in methods of automatic assessment of spoken language proficiency for use in interactive electronic learning tools as well as for grading candidates for formal qualifications. This paper presents an automatic system to address the assessment of spontaneous spoken language. Prompts or questions requiring spontaneous speech responses elicit more natural speech which better reflects a learner’s proficiency level than read speech. In addition to the challenges of highly variable non-native, learner, speech and noisy real-world recording conditions, this requires any automatic system to handle disfluent, non-grammatical, spontaneous speech with the underlying text unknown. To handle these, a strong deep learning based speech recognition system is applied in combination with a Gaussian Process (GP) grader. A range of features derived from the audio using the recognition hypothesis are investigated for their efficacy in the automatic grader. The proposed system is shown to predict grades at a similar level to the original examiner graders on real candidate entries. Interpolation with the examiner grades further boosts performance. The ability to reject poorly estimated grades is also important and measures are proposed to evaluate the performance of rejection schemes. The GP variance is used to decide which automatic grades should be rejected. Back-off to an expert grader for the least confident grades gives gains.Cambridge Assessment Englis

Apollo (Cambridge)

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Author: Gatt Albert
Krahmer Emiel
Publication venue
Publication date: 01/01/2017
Field of study

This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

arXiv.org e-Print Archive

OAR@UM

Tilburg University Repository