3,007 research outputs found
A novel non-intrusive objective method to predict voice quality of service in LTE networks.
This research aimed to introduce a novel approach for non-intrusive objective
measurement of voice Quality of Service (QoS) in LTE networks. While achieving this aim, the thesis established a thorough knowledge of how voice traffic is
handled in LTE networks, the LTE network architecture and its similarities and
differences to its predecessors and traditional ground IP networks and most
importantly those QoS affecting parameters which are exclusive to LTE environments. Mean Opinion Score (MOS) is the scoring system used to measure
the QoS of voice traffic which can be measured subjectively (as originally intended). Subjective QoS measurement methods are costly and time-consuming,
therefore, objective methods such as Perceptual Evaluation of Speech Quality
(PESQ) were developed to address these limitations. These objective methods
have a high correlation with subjective MOS scores. However, they either require individual calculation of many network parameters or have an intrusive
nature that requires access to both the reference signal and the degraded signal
for comparison by software. Therefore, the current objective methods are not
suitable for application in real-time measurement and prediction scenarios.
A major contribution of the research was identifying LTE-specific QoS affecting parameters. There is no previous work that combines these parameters to
assess their impacts on QoS.
The experiment was configured in a hardware in the loop environment. This
configuration could serve as a platform for future research which requires simulation of voice traffic in LTE environments.
The key contribution of this research is a novel non-intrusive objective method
for QoS measurement and prediction using neural networks. A comparative
analysis is presented that examines the performance of four neural network
algorithms for non-intrusive measurement and prediction of voice quality over
LTE networks. In conclusion, the Bayesian Regularization algorithm with 4 neurons in the hidden layer and sigmoid symmetric transfer function was identified as the best solution with a Mean Square Error (MSE) rate of 0.001 and
regression value of 0.998 measured for the testing data set
Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction
Non-intrusive intelligibility prediction is important for its application in
realistic scenarios, where a clean reference signal is difficult to access. The
construction of many non-intrusive predictors require either ground truth
intelligibility labels or clean reference signals for supervised learning. In
this work, we leverage an unsupervised uncertainty estimation method for
predicting speech intelligibility, which does not require intelligibility
labels or reference signals to train the predictor. Our experiments demonstrate
that the uncertainty from state-of-the-art end-to-end automatic speech
recognition (ASR) models is highly correlated with speech intelligibility. The
proposed method is evaluated on two databases and the results show that the
unsupervised uncertainty measures of ASR models are more correlated with speech
intelligibility from listening results than the predictions made by widely used
intrusive methods.Comment: Submitted to INTERSPEECH202
AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks
Automatic speech recognition (ASR) systems are of vital importance nowadays
in commonplace tasks such as speech-to-text processing and language
translation. This created the need for an ASR system that can operate in
realistic crowded environments. Thus, speech enhancement is a valuable building
block in ASR systems and other applications such as hearing aids, smartphones
and teleconferencing systems. In this paper, a generative adversarial network
(GAN) based framework is investigated for the task of speech enhancement, more
specifically speech denoising of audio tracks. A new architecture based on
CasNet generator and an additional feature-based loss are incorporated to get
realistically denoised speech phonetics. Finally, the proposed framework is
shown to outperform other learning and traditional model-based speech
enhancement approaches.Comment: 5 pages, 4 figures and 2 Tables. Accepted in EUSIPCO 202
Affective learning: improving engagement and enhancing learning with affect-aware feedback
This paper describes the design and ecologically valid evaluation of a learner model that lies at the heart of an intelligent learning environment called iTalk2Learn. A core objective of the learner model is to adapt formative feedback based on students’ affective states. Types of adaptation include what type of formative feedback should be provided and how it should be presented. Two Bayesian networks trained with data gathered in a series of Wizard-of-Oz studies are used for the adaptation process. This paper reports results from a quasi-experimental evaluation, in authentic classroom settings, which compared a version of iTalk2Learn that adapted feedback based on students’ affective states as they were talking aloud with the system (the affect condition) with one that provided feedback based only on the students’ performance (the non-affect condition). Our results suggest that affect-aware support contributes to reducing boredom and off-task behavior, and may have an effect on learning. We discuss the internal and ecological validity of the study, in light of pedagogical considerations that informed the design of the two conditions. Overall, the results of the study have implications both for the design of educational technology and for classroom approaches to teaching, because they highlight the important role that affect-aware modelling plays in the adaptive delivery of formative feedback to support learning
CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment
Speech quality assessment has been a critical component in many voice
communication related applications such as telephony and online conferencing.
Traditional intrusive speech quality assessment requires the clean reference of
the degraded utterance to provide an accurate quality measurement. This
requirement limits the usability of these methods in real-world scenarios. On
the other hand, non-intrusive subjective measurement is the ``golden standard"
in evaluating speech quality as human listeners can intrinsically evaluate the
quality of any degraded speech with ease. In this paper, we propose a novel
end-to-end model structure called Convolutional Context-Aware Transformer
(CCAT) network to predict the mean opinion score (MOS) of human raters. We
evaluate our model on three MOS-annotated datasets spanning multiple languages
and distortion types and submit our results to the ConferencingSpeech 2022
Challenge. Our experiments show that CCAT provides promising MOS predictions
compared to current state-of-art non-intrusive speech assessment models with
average Pearson correlation coefficient (PCC) increasing from 0.530 to 0.697
and average RMSE decreasing from 0.768 to 0.570 compared to the baseline model
on the challenge evaluation test set
Anticipatory Mobile Computing: A Survey of the State of the Art and Research Challenges
Today's mobile phones are far from mere communication devices they were ten
years ago. Equipped with sophisticated sensors and advanced computing hardware,
phones can be used to infer users' location, activity, social setting and more.
As devices become increasingly intelligent, their capabilities evolve beyond
inferring context to predicting it, and then reasoning and acting upon the
predicted context. This article provides an overview of the current state of
the art in mobile sensing and context prediction paving the way for
full-fledged anticipatory mobile computing. We present a survey of phenomena
that mobile phones can infer and predict, and offer a description of machine
learning techniques used for such predictions. We then discuss proactive
decision making and decision delivery via the user-device feedback loop.
Finally, we discuss the challenges and opportunities of anticipatory mobile
computing.Comment: 29 pages, 5 figure
An Effective Machine Learning (ML) Approach to Quality Assessment of Voice over IP (VoIP) Calls
This letter puts forward a supervised ML tech2
nique to determine the Quality of Experience (QoE) of VoIP calls. It takes its beginning from an investigation on VQmon, an
enhanced E-model version that estimates the quality of IP-based
voice calls adopting an objective approach. The current study
demonstrates VQmon shortcomings via a comparison between
the Mean Opinion Score (MOS) values this technique predicts
and the actual average ratings collected from a subjective
listening quality campaign. It proposes to deploy Ordinal Logistic
Regression (OLR) for speech quality assessment, and results disclose that OLR outperforms popular ML algorithms, in accuracy and confusion matrices
- …