3,405 research outputs found

    A novel non-intrusive objective method to predict voice quality of service in LTE networks.

    Get PDF
    This research aimed to introduce a novel approach for non-intrusive objective measurement of voice Quality of Service (QoS) in LTE networks. While achieving this aim, the thesis established a thorough knowledge of how voice traffic is handled in LTE networks, the LTE network architecture and its similarities and differences to its predecessors and traditional ground IP networks and most importantly those QoS affecting parameters which are exclusive to LTE environments. Mean Opinion Score (MOS) is the scoring system used to measure the QoS of voice traffic which can be measured subjectively (as originally intended). Subjective QoS measurement methods are costly and time-consuming, therefore, objective methods such as Perceptual Evaluation of Speech Quality (PESQ) were developed to address these limitations. These objective methods have a high correlation with subjective MOS scores. However, they either require individual calculation of many network parameters or have an intrusive nature that requires access to both the reference signal and the degraded signal for comparison by software. Therefore, the current objective methods are not suitable for application in real-time measurement and prediction scenarios. A major contribution of the research was identifying LTE-specific QoS affecting parameters. There is no previous work that combines these parameters to assess their impacts on QoS. The experiment was configured in a hardware in the loop environment. This configuration could serve as a platform for future research which requires simulation of voice traffic in LTE environments. The key contribution of this research is a novel non-intrusive objective method for QoS measurement and prediction using neural networks. A comparative analysis is presented that examines the performance of four neural network algorithms for non-intrusive measurement and prediction of voice quality over LTE networks. In conclusion, the Bayesian Regularization algorithm with 4 neurons in the hidden layer and sigmoid symmetric transfer function was identified as the best solution with a Mean Square Error (MSE) rate of 0.001 and regression value of 0.998 measured for the testing data set

    Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

    Full text link
    Non-intrusive intelligibility prediction is important for its application in realistic scenarios, where a clean reference signal is difficult to access. The construction of many non-intrusive predictors require either ground truth intelligibility labels or clean reference signals for supervised learning. In this work, we leverage an unsupervised uncertainty estimation method for predicting speech intelligibility, which does not require intelligibility labels or reference signals to train the predictor. Our experiments demonstrate that the uncertainty from state-of-the-art end-to-end automatic speech recognition (ASR) models is highly correlated with speech intelligibility. The proposed method is evaluated on two databases and the results show that the unsupervised uncertainty measures of ASR models are more correlated with speech intelligibility from listening results than the predictions made by widely used intrusive methods.Comment: Submitted to INTERSPEECH202

    Affective learning: improving engagement and enhancing learning with affect-aware feedback

    Get PDF
    This paper describes the design and ecologically valid evaluation of a learner model that lies at the heart of an intelligent learning environment called iTalk2Learn. A core objective of the learner model is to adapt formative feedback based on students’ affective states. Types of adaptation include what type of formative feedback should be provided and how it should be presented. Two Bayesian networks trained with data gathered in a series of Wizard-of-Oz studies are used for the adaptation process. This paper reports results from a quasi-experimental evaluation, in authentic classroom settings, which compared a version of iTalk2Learn that adapted feedback based on students’ affective states as they were talking aloud with the system (the affect condition) with one that provided feedback based only on the students’ performance (the non-affect condition). Our results suggest that affect-aware support contributes to reducing boredom and off-task behavior, and may have an effect on learning. We discuss the internal and ecological validity of the study, in light of pedagogical considerations that informed the design of the two conditions. Overall, the results of the study have implications both for the design of educational technology and for classroom approaches to teaching, because they highlight the important role that affect-aware modelling plays in the adaptive delivery of formative feedback to support learning

    AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks

    Full text link
    Automatic speech recognition (ASR) systems are of vital importance nowadays in commonplace tasks such as speech-to-text processing and language translation. This created the need for an ASR system that can operate in realistic crowded environments. Thus, speech enhancement is a valuable building block in ASR systems and other applications such as hearing aids, smartphones and teleconferencing systems. In this paper, a generative adversarial network (GAN) based framework is investigated for the task of speech enhancement, more specifically speech denoising of audio tracks. A new architecture based on CasNet generator and an additional feature-based loss are incorporated to get realistically denoised speech phonetics. Finally, the proposed framework is shown to outperform other learning and traditional model-based speech enhancement approaches.Comment: 5 pages, 4 figures and 2 Tables. Accepted in EUSIPCO 202
    • …
    corecore