3,007 research outputs found

    A novel non-intrusive objective method to predict voice quality of service in LTE networks.

    Get PDF
    This research aimed to introduce a novel approach for non-intrusive objective measurement of voice Quality of Service (QoS) in LTE networks. While achieving this aim, the thesis established a thorough knowledge of how voice traffic is handled in LTE networks, the LTE network architecture and its similarities and differences to its predecessors and traditional ground IP networks and most importantly those QoS affecting parameters which are exclusive to LTE environments. Mean Opinion Score (MOS) is the scoring system used to measure the QoS of voice traffic which can be measured subjectively (as originally intended). Subjective QoS measurement methods are costly and time-consuming, therefore, objective methods such as Perceptual Evaluation of Speech Quality (PESQ) were developed to address these limitations. These objective methods have a high correlation with subjective MOS scores. However, they either require individual calculation of many network parameters or have an intrusive nature that requires access to both the reference signal and the degraded signal for comparison by software. Therefore, the current objective methods are not suitable for application in real-time measurement and prediction scenarios. A major contribution of the research was identifying LTE-specific QoS affecting parameters. There is no previous work that combines these parameters to assess their impacts on QoS. The experiment was configured in a hardware in the loop environment. This configuration could serve as a platform for future research which requires simulation of voice traffic in LTE environments. The key contribution of this research is a novel non-intrusive objective method for QoS measurement and prediction using neural networks. A comparative analysis is presented that examines the performance of four neural network algorithms for non-intrusive measurement and prediction of voice quality over LTE networks. In conclusion, the Bayesian Regularization algorithm with 4 neurons in the hidden layer and sigmoid symmetric transfer function was identified as the best solution with a Mean Square Error (MSE) rate of 0.001 and regression value of 0.998 measured for the testing data set

    Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

    Full text link
    Non-intrusive intelligibility prediction is important for its application in realistic scenarios, where a clean reference signal is difficult to access. The construction of many non-intrusive predictors require either ground truth intelligibility labels or clean reference signals for supervised learning. In this work, we leverage an unsupervised uncertainty estimation method for predicting speech intelligibility, which does not require intelligibility labels or reference signals to train the predictor. Our experiments demonstrate that the uncertainty from state-of-the-art end-to-end automatic speech recognition (ASR) models is highly correlated with speech intelligibility. The proposed method is evaluated on two databases and the results show that the unsupervised uncertainty measures of ASR models are more correlated with speech intelligibility from listening results than the predictions made by widely used intrusive methods.Comment: Submitted to INTERSPEECH202

    AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks

    Full text link
    Automatic speech recognition (ASR) systems are of vital importance nowadays in commonplace tasks such as speech-to-text processing and language translation. This created the need for an ASR system that can operate in realistic crowded environments. Thus, speech enhancement is a valuable building block in ASR systems and other applications such as hearing aids, smartphones and teleconferencing systems. In this paper, a generative adversarial network (GAN) based framework is investigated for the task of speech enhancement, more specifically speech denoising of audio tracks. A new architecture based on CasNet generator and an additional feature-based loss are incorporated to get realistically denoised speech phonetics. Finally, the proposed framework is shown to outperform other learning and traditional model-based speech enhancement approaches.Comment: 5 pages, 4 figures and 2 Tables. Accepted in EUSIPCO 202

    Affective learning: improving engagement and enhancing learning with affect-aware feedback

    Get PDF
    This paper describes the design and ecologically valid evaluation of a learner model that lies at the heart of an intelligent learning environment called iTalk2Learn. A core objective of the learner model is to adapt formative feedback based on students’ affective states. Types of adaptation include what type of formative feedback should be provided and how it should be presented. Two Bayesian networks trained with data gathered in a series of Wizard-of-Oz studies are used for the adaptation process. This paper reports results from a quasi-experimental evaluation, in authentic classroom settings, which compared a version of iTalk2Learn that adapted feedback based on students’ affective states as they were talking aloud with the system (the affect condition) with one that provided feedback based only on the students’ performance (the non-affect condition). Our results suggest that affect-aware support contributes to reducing boredom and off-task behavior, and may have an effect on learning. We discuss the internal and ecological validity of the study, in light of pedagogical considerations that informed the design of the two conditions. Overall, the results of the study have implications both for the design of educational technology and for classroom approaches to teaching, because they highlight the important role that affect-aware modelling plays in the adaptive delivery of formative feedback to support learning

    CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment

    Full text link
    Speech quality assessment has been a critical component in many voice communication related applications such as telephony and online conferencing. Traditional intrusive speech quality assessment requires the clean reference of the degraded utterance to provide an accurate quality measurement. This requirement limits the usability of these methods in real-world scenarios. On the other hand, non-intrusive subjective measurement is the ``golden standard" in evaluating speech quality as human listeners can intrinsically evaluate the quality of any degraded speech with ease. In this paper, we propose a novel end-to-end model structure called Convolutional Context-Aware Transformer (CCAT) network to predict the mean opinion score (MOS) of human raters. We evaluate our model on three MOS-annotated datasets spanning multiple languages and distortion types and submit our results to the ConferencingSpeech 2022 Challenge. Our experiments show that CCAT provides promising MOS predictions compared to current state-of-art non-intrusive speech assessment models with average Pearson correlation coefficient (PCC) increasing from 0.530 to 0.697 and average RMSE decreasing from 0.768 to 0.570 compared to the baseline model on the challenge evaluation test set

    Anticipatory Mobile Computing: A Survey of the State of the Art and Research Challenges

    Get PDF
    Today's mobile phones are far from mere communication devices they were ten years ago. Equipped with sophisticated sensors and advanced computing hardware, phones can be used to infer users' location, activity, social setting and more. As devices become increasingly intelligent, their capabilities evolve beyond inferring context to predicting it, and then reasoning and acting upon the predicted context. This article provides an overview of the current state of the art in mobile sensing and context prediction paving the way for full-fledged anticipatory mobile computing. We present a survey of phenomena that mobile phones can infer and predict, and offer a description of machine learning techniques used for such predictions. We then discuss proactive decision making and decision delivery via the user-device feedback loop. Finally, we discuss the challenges and opportunities of anticipatory mobile computing.Comment: 29 pages, 5 figure

    An Effective Machine Learning (ML) Approach to Quality Assessment of Voice over IP (VoIP) Calls

    Get PDF
    This letter puts forward a supervised ML tech2 nique to determine the Quality of Experience (QoE) of VoIP calls. It takes its beginning from an investigation on VQmon, an enhanced E-model version that estimates the quality of IP-based voice calls adopting an objective approach. The current study demonstrates VQmon shortcomings via a comparison between the Mean Opinion Score (MOS) values this technique predicts and the actual average ratings collected from a subjective listening quality campaign. It proposes to deploy Ordinal Logistic Regression (OLR) for speech quality assessment, and results disclose that OLR outperforms popular ML algorithms, in accuracy and confusion matrices
    • …
    corecore