85 research outputs found
Speech quality prediction for voice over Internet protocol networks
Merged with duplicate record 10026.1/878 on 03.01.2017 by CS (TIS). Merged with duplicate record 10026.1/1657 on 15.03.2017 by CS (TIS)This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.IP networks are on a steep slope of innovation that will make them the long-term carrier
of all types of traffic, including voice. However, such networks are not designed to support
real-time voice communication because their variable characteristics (e.g. due to delay, delay
variation and packet loss) lead to a deterioration in voice quality. A major challenge in such networks
is how to measure or predict voice quality accurately and efficiently for QoS monitoring
and/or control purposes to ensure that technical and commercial requirements are met.
Voice quality can be measured using either subjective or objective methods. Subjective
measurement (e.g. MOS) is the benchmark for objective methods, but it is slow, time consuming
and expensive. Objective measurement can be intrusive or non-intrusive. Intrusive methods
(e.g. ITU PESQ) are more accurate, but normally are unsuitable for monitoring live traffic
because of the need for a reference data and to utilise the network. This makes non-intrusive
methods(e.g. ITU E-model) more attractive for monitoring voice quality from IP network impairments.
However, current non-intrusive methods rely on subjective tests to derive model
parameters and as a result are limited and do not meet new and emerging applications.
The main goal of the project is to develop novel and efficient models for non-intrusive
speech quality prediction to overcome the disadvantages of current subjective-based methods
and to demonstrate their usefulness in new and emerging VoIP applications. The main contributions
of the thesis are fourfold:
(1) a detailed understanding of the relationships between voice quality, IP network impairments
(e.g. packet loss, jitter and delay) and relevant parameters associated with speech (e.g.
codec type, gender and language) is provided. An understanding of the perceptual effects of
these key parameters on voice quality is important as it provides a basis for the development
of non-intrusive voice quality prediction models. A fundamental investigation of the impact of
the parameters on perceived voice quality was carried out using the latest ITU algorithm for
perceptual evaluation of speech quality, PESQ, and by exploiting the ITU E-model to obtain an
objective measure of voice quality.
(2) a new methodology to predict voice quality non-intrusively was developed. The method
exploits the intrusive algorithm, PESQ, and a combined PESQ/E-model structure to provide a
perceptually accurate prediction of both listening and conversational voice quality non-intrusively.
This avoids time-consuming subjective tests and so removes one of the major obstacles in the
development of models for voice quality prediction. The method is generic and as such has
wide applicability in multimedia applications. Efficient regression-based models and robust
artificial neural network-based learning models were developed for predicting voice quality
non-intrusively for VoIP applications.
(3) three applications of the new models were investigated: voice quality monitoring/prediction
for real Internet VoIP traces, perceived quality driven playout buffer optimization and
perceived quality driven QoS control. The neural network and regression models were both
used to predict voice quality for real Internet VoIP traces based on international links. A new
adaptive playout buffer and a perceptual optimization playout buffer algorithms are presented.
A QoS control scheme that combines the strengths of rate-adaptive and priority marking control
schemes to provide a superior QoS control in terms of measured perceived voice quality is
also provided.
(4) a new methodology for Internet-based subjective speech quality measurement which
allows rapid assessment of voice quality for VoIP applications is proposed and assessed using
both objective and traditional MOS test methods
Enhancement of perceived quality of service for voice over internet protocol systems
Voice over Internet Protocol (WIP) applications are becoming more and more popular in
the telecommunication market. Packet switched V61P systems have many technical advantages
over conventional Public Switched Telephone Network (PSTN), including its efficient and flexible
use of the bandwidth, lower cost and enhanced security.
However, due to the IP network's "Best Effort" nature, voice quality are not naturally guaranteed
in the VoIP services. In fact, most current Vol]P services can not provide as good a voice
quality as PSTN. IP Network impairments such as packet loss, delay and jitter affect perceived
speech quality as do application layer impairment factors, such as codec rate and audio features.
Current perceived Quality of Service (QoS) methods are mainly designed to be used
in a PSTN/TDM environment and their performance in V6IP environment is unknown. It is a
challenge to measure perceived speech quality correctly in V61P system and to enhance user
perceived speech quality for VoIP system.
The main goal of this project is to evaluate the accuracy of the existing ITU-T speech quality
measurement method (Perceptual Evaluation of Speech Quality - PESQ) in mobile wireless
systems in the context of V61P, and to develop novel and efficient methods to enhance the user
perceived speech quality for emerging V61P services especially in mobile V61P environment.
The main contributions of the thesis are threefold:
(1) A new discovery of PESQ errors in mobile VoIP environment. A detailed investigation
of PESQ performance in mobile VoIP environment was undertaken and included setting up a
PESQ performance evaluation platform and testing over 1800 mobile-to-mobile and mobileto-
PSTN calls over a period of three months. The accuracy issues of PESQ algorithm was
investigated and main problems causing inaccurate PESQ score (improper time-alignment in
the PESQ algorithm) were discovered
.
Calibration issues for a safe and proper PESQ testing
in mobile environment were also discussed in the thesis.
(2) A new, simple-to-use, V611Pjit ter buffer algorithm. This was developed and implemented
in a commercial mobile handset. The algorithm, called "Play Late Algorithm", adaptively alters
the playout delay inside a speech talkspurt without introducing unnecessary extra end-to-end
delay. It can be used as a front-end to conventional static or adaptive jitter buffer algorithms
to provide improved performance. Results show that the proposed algorithm can increase user
perceived quality without consuming too much processing power when tested in live wireless
VbIP networks.
(3) A new QoS enhancement scheme. The new scheme combines the strengths of adaptive
codec bit rate (i. e. AMR 8-modes bit rate) and speech priority marking (i. e. giving high priority
for the beginning of a voiced segment). The results gathered on a simulation and emulation test
platform shows that the combined method provides a better user perceived speech quality than
separate adaptive sender bit rate or packet priority marking methods
Considering Bluetooth's Subband Codec (SBC) for Wideband Speech and Audio on the Internet
The Bluetooth Special Interest Group (SIG) has standardized the subband coding (SBC) audio codec to connect headphones via wireless Bluetooth links. SBC compresses audio at high fidelity while having an ultra-low algorithm delay. To make SBC suitable for the Internet, we extend it by using a time and packet loss concealment (PLC) algorithm that is based on ITU's G.711 Appendix I. The design is novel in the aspect of the interface between codec and speech receiver. We developed a new approach on how to distribute the functionality of a speech receiver between codec and application. Our approach leads to easier implementations of high quality VoIP applications.
We conducted subjective and objective listening tests of the audio quality of SBC and PLC in order to determine an optimal coding mode and the trade-off between coding mode and packet loss rate. More precisely, we conducted MUSHRA listening tests for selected sample items. These tests results are then compared with the results of multiple objective assessment algorithms (ITU P.862 PESQ, ITU BS.1387-1 PEAQ, Creusere's algorithm). We found out that a combination of the PEAQ basic and advanced values best matches---after third order linear regression---the subjective MUSHRA results . The linear regression has coefficient of determination of R²=0.907². By comparison, our individual human ratings show a correlation of about R=0.9 compared to our averaged human rating results.
Using the combination of both PEAQ algorithms, we calculate hundred thousands of objective audio quality ratings varying audio content and algorithmic parameters of SBC and PLC. The results show which set of parameters value are best suitable for a bandwidth and delay constrained link. The transmission quality of SBC is enhanced significantly by selecting optimal encoding parameters as compared to the default parameter sets given in the standard.
Finally, we present preliminary objective tests results on the comparison of the audio codecs SBC, CELT, APT-X and ULD coding speech and audio transmission. They all allow a mono and stereo transmission of music at ultra-low coding delays (<10ms), which is especially useful for distributed ensemble performances over the Internet
Recommended from our members
Operating System Based Perceptual Evaluation of Call Quality in Radio Telecommunications Networks. Development of call quality assessment at mobile terminals using the Symbian operating system, comparison with traditional approaches and proposals for a tariff regime relating call charging to perceived speech quality.
Call quality has been crucial from the inception of telecommunication networks.
Operators need to monitor call quality from the end-user¿s perspective, in order to retain
subscribers and reduce subscriber ¿churn¿. Operators worry not only about call quality and
interconnect revenue loss, but also about network connectivity issues in areas where mobile
network gateways are prevalent. Bandwidth quality as experienced by the end-user is equally
important in helping operators to reduce churn.
The parameters that network operators use to improve call quality are mainly from the
end-user¿s perspective. These parameters are usually ASR (answer seizure ratio), PDD (postdial
delay), NER (network efficiency ratio), the number of calls for which these parameters
have been analyzed and successful calls. Operators use these parameters to evaluate and
optimize the network to meet their quality requirements.
Analysis of speech quality is a major arena for research. Traditionally, users¿ perception
of speech quality has been measured offline using subjective listening tests. Such tests are,
however, slow, tedious and costly. An alternative method is therefore needed; one that can be
automatically computed on the subscriber¿s handset, be available to the operator as well as to
subscribers and, at the same time, provide results that are comparable with conventional
subjective scores. QMeter® ¿ a set of tools for signal and bandwidth measurement that have
been developed bearing in mind all the parameters that influence call and bandwidth quality
experienced by the end-user ¿ addresses these issues and, additionally, facilitates dynamic tariff
propositions which enhance the credibility of the operator.
This research focuses on call quality parameters from the end-user¿s perspective. The
call parameters used in the research are signal strength, successful call rate, normal drop call
rate, and hand-over drop rate. Signal strength is measured for every five milliseconds of an
active call and average signal strength is calculated for each successful call. The successful call
rate, normal drop rate and hand-over drop rate are used to achieve a measurement of the overall
call quality. Call quality with respect to bundles of 10 calls is proposed.
An attempt is made to visualize these parameters for better understanding of where the
quality is bad, good and excellent. This will help operators, as well as user groups, to measure
quality and coverage.
Operators boast about their bandwidth but in reality, to know the locations where speed
has to be improved, they need a tool that can effectively measure speed from the end-user¿s
perspective. BM (bandwidth meter), a tool developed as a part of this research, measures the
average speed of data sessions and stores the information for analysis at different locations.
To address issues of quality in the subscriber segment, this research proposes the
varying of tariffs based on call and bandwidth quality. Call charging based on call quality as
perceived by the end-user is proposed, both to satisfy subscribers and help operators to improve
customer satisfaction and increase average revenue per user. Tariff redemption procedures are
put forward for bundles of 10 calls and 10 data sessions. In addition to the varying of tariffs,
quality escalation processes are proposed. Deploying such tools on selected or random samples
of users will result in substantial improvement in user loyalty which, in turn, will bring
operational and economic advantages
Quality aspects of Internet telephony
Internet telephony has had a tremendous impact on how people communicate.
Many now maintain contact using some form of Internet telephony.
Therefore the motivation for this work has been to address the quality aspects
of real-world Internet telephony for both fixed and wireless telecommunication.
The focus has been on the quality aspects of voice communication,
since poor quality leads often to user dissatisfaction. The scope of the work
has been broad in order to address the main factors within IP-based voice
communication.
The first four chapters of this dissertation constitute the background
material. The first chapter outlines where Internet telephony is deployed
today. It also motivates the topics and techniques used in this research.
The second chapter provides the background on Internet telephony including
signalling, speech coding and voice Internetworking. The third chapter
focuses solely on quality measures for packetised voice systems and finally
the fourth chapter is devoted to the history of voice research.
The appendix of this dissertation constitutes the research contributions.
It includes an examination of the access network, focusing on how calls are
multiplexed in wired and wireless systems. Subsequently in the wireless
case, we consider how to handover calls from 802.11 networks to the cellular
infrastructure. We then consider the Internet backbone where most of our
work is devoted to measurements specifically for Internet telephony. The
applications of these measurements have been estimating telephony arrival
processes, measuring call quality, and quantifying the trend in Internet telephony
quality over several years. We also consider the end systems, since
they are responsible for reconstructing a voice stream given loss and delay
constraints. Finally we estimate voice quality using the ITU proposal PESQ
and the packet loss process.
The main contribution of this work is a systematic examination of Internet
telephony. We describe several methods to enable adaptable solutions
for maintaining consistent voice quality. We have also found that relatively
small technical changes can lead to substantial user quality improvements.
A second contribution of this work is a suite of software tools designed to
ascertain voice quality in IP networks. Some of these tools are in use within
commercial systems today
Structure-Constrained Basis Pursuit for Compressively Sensing Speech
Compressed Sensing (CS) exploits the sparsity of many signals to enable sampling below the Nyquist rate. If the original signal is sufficiently sparse, the Basis Pursuit (BP) algorithm will perfectly reconstruct the original signal. Unfortunately many signals that intuitively appear sparse do not meet the threshold for sufficient sparsity . These signals require so many CS samples for accurate reconstruction that the advantages of CS disappear. This is because Basis Pursuit/Basis Pursuit Denoising only models sparsity. We developed a Structure-Constrained Basis Pursuit that models the structure of somewhat sparse signals as upper and lower bound constraints on the Basis Pursuit Denoising solution. We applied it to speech, which seems sparse but does not compress well with CS, and gained improved quality over Basis Pursuit Denoising. When a single parameter (i.e. the phone) is encoded, Normalized Mean Squared Error (NMSE) decreases by between 16.2% and 1.00% when sampling with CS between 1/10 and 1/2 the Nyquist rate, respectively. When bounds are coded as a sum of Gaussians, NMSE decreases between 28.5% and 21.6% in the same range. SCBP can be applied to any somewhat sparse signal with a predictable structure to enable improved reconstruction quality with the same number of samples
- …