3,405 research outputs found
A novel non-intrusive objective method to predict voice quality of service in LTE networks.
This research aimed to introduce a novel approach for non-intrusive objective
measurement of voice Quality of Service (QoS) in LTE networks. While achieving this aim, the thesis established a thorough knowledge of how voice traffic is
handled in LTE networks, the LTE network architecture and its similarities and
differences to its predecessors and traditional ground IP networks and most
importantly those QoS affecting parameters which are exclusive to LTE environments. Mean Opinion Score (MOS) is the scoring system used to measure
the QoS of voice traffic which can be measured subjectively (as originally intended). Subjective QoS measurement methods are costly and time-consuming,
therefore, objective methods such as Perceptual Evaluation of Speech Quality
(PESQ) were developed to address these limitations. These objective methods
have a high correlation with subjective MOS scores. However, they either require individual calculation of many network parameters or have an intrusive
nature that requires access to both the reference signal and the degraded signal
for comparison by software. Therefore, the current objective methods are not
suitable for application in real-time measurement and prediction scenarios.
A major contribution of the research was identifying LTE-specific QoS affecting parameters. There is no previous work that combines these parameters to
assess their impacts on QoS.
The experiment was configured in a hardware in the loop environment. This
configuration could serve as a platform for future research which requires simulation of voice traffic in LTE environments.
The key contribution of this research is a novel non-intrusive objective method
for QoS measurement and prediction using neural networks. A comparative
analysis is presented that examines the performance of four neural network
algorithms for non-intrusive measurement and prediction of voice quality over
LTE networks. In conclusion, the Bayesian Regularization algorithm with 4 neurons in the hidden layer and sigmoid symmetric transfer function was identified as the best solution with a Mean Square Error (MSE) rate of 0.001 and
regression value of 0.998 measured for the testing data set
Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction
Non-intrusive intelligibility prediction is important for its application in
realistic scenarios, where a clean reference signal is difficult to access. The
construction of many non-intrusive predictors require either ground truth
intelligibility labels or clean reference signals for supervised learning. In
this work, we leverage an unsupervised uncertainty estimation method for
predicting speech intelligibility, which does not require intelligibility
labels or reference signals to train the predictor. Our experiments demonstrate
that the uncertainty from state-of-the-art end-to-end automatic speech
recognition (ASR) models is highly correlated with speech intelligibility. The
proposed method is evaluated on two databases and the results show that the
unsupervised uncertainty measures of ASR models are more correlated with speech
intelligibility from listening results than the predictions made by widely used
intrusive methods.Comment: Submitted to INTERSPEECH202
Affective learning: improving engagement and enhancing learning with affect-aware feedback
This paper describes the design and ecologically valid evaluation of a learner model that lies at the heart of an intelligent learning environment called iTalk2Learn. A core objective of the learner model is to adapt formative feedback based on students’ affective states. Types of adaptation include what type of formative feedback should be provided and how it should be presented. Two Bayesian networks trained with data gathered in a series of Wizard-of-Oz studies are used for the adaptation process. This paper reports results from a quasi-experimental evaluation, in authentic classroom settings, which compared a version of iTalk2Learn that adapted feedback based on students’ affective states as they were talking aloud with the system (the affect condition) with one that provided feedback based only on the students’ performance (the non-affect condition). Our results suggest that affect-aware support contributes to reducing boredom and off-task behavior, and may have an effect on learning. We discuss the internal and ecological validity of the study, in light of pedagogical considerations that informed the design of the two conditions. Overall, the results of the study have implications both for the design of educational technology and for classroom approaches to teaching, because they highlight the important role that affect-aware modelling plays in the adaptive delivery of formative feedback to support learning
AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks
Automatic speech recognition (ASR) systems are of vital importance nowadays
in commonplace tasks such as speech-to-text processing and language
translation. This created the need for an ASR system that can operate in
realistic crowded environments. Thus, speech enhancement is a valuable building
block in ASR systems and other applications such as hearing aids, smartphones
and teleconferencing systems. In this paper, a generative adversarial network
(GAN) based framework is investigated for the task of speech enhancement, more
specifically speech denoising of audio tracks. A new architecture based on
CasNet generator and an additional feature-based loss are incorporated to get
realistically denoised speech phonetics. Finally, the proposed framework is
shown to outperform other learning and traditional model-based speech
enhancement approaches.Comment: 5 pages, 4 figures and 2 Tables. Accepted in EUSIPCO 202
- …