181 research outputs found
Bilateral Waveform Similarity Overlap-and-Add Based Packet Loss Concealment for Voice over IP
This paper invested a bilateral waveform similarity overlap-and-add algorithm for voice packet lost. Since Packet lost will cause the semantic misunderstanding, it has become one of the most essential problems in speech communication. This investment is based on waveform similarity measure using overlap-and-Add algorithm and provides the bilateral information to enhance the speech signal reconstruction. Traditionally, it has been improved that waveform similarity overlap-and-add (WSOLA) technique is an effective algorithm to deal with packet loss concealment (PLC) for real-time time communication. WSOLA algorithm is widely applied to deal with the length adaptation and packet loss concealment of speech signal. Time scale modification of audio signal is one of the most essential research topics in data communication, especially in voice of IP (VoIP). Herein, the proposed the bilateral WSOLA (BWSOLA) that is derived from WSOLA. Instead of only exploitation one direction speech data, the proposed method will reconstruct the lost voice data according to the preceding and cascading data. The related algorithms have been developed to achieve the optimal reconstructing estimation. The experimental results show that the quality of the reconstructed speech signal of the bilateral WSOLA is much better compared to the standard WSOLA and GWSOLA on different packet loss rate and length using the metrics PESQ and MOS. The significant improvement is obtained by bilateral information and proposed method. The proposed bilateral waveform similarity overlap-and-add (BWSOLA) outperforms the traditional approaches especially in the long duration data loss
Speech quality prediction for voice over Internet protocol networks
Merged with duplicate record 10026.1/878 on 03.01.2017 by CS (TIS). Merged with duplicate record 10026.1/1657 on 15.03.2017 by CS (TIS)This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.IP networks are on a steep slope of innovation that will make them the long-term carrier
of all types of traffic, including voice. However, such networks are not designed to support
real-time voice communication because their variable characteristics (e.g. due to delay, delay
variation and packet loss) lead to a deterioration in voice quality. A major challenge in such networks
is how to measure or predict voice quality accurately and efficiently for QoS monitoring
and/or control purposes to ensure that technical and commercial requirements are met.
Voice quality can be measured using either subjective or objective methods. Subjective
measurement (e.g. MOS) is the benchmark for objective methods, but it is slow, time consuming
and expensive. Objective measurement can be intrusive or non-intrusive. Intrusive methods
(e.g. ITU PESQ) are more accurate, but normally are unsuitable for monitoring live traffic
because of the need for a reference data and to utilise the network. This makes non-intrusive
methods(e.g. ITU E-model) more attractive for monitoring voice quality from IP network impairments.
However, current non-intrusive methods rely on subjective tests to derive model
parameters and as a result are limited and do not meet new and emerging applications.
The main goal of the project is to develop novel and efficient models for non-intrusive
speech quality prediction to overcome the disadvantages of current subjective-based methods
and to demonstrate their usefulness in new and emerging VoIP applications. The main contributions
of the thesis are fourfold:
(1) a detailed understanding of the relationships between voice quality, IP network impairments
(e.g. packet loss, jitter and delay) and relevant parameters associated with speech (e.g.
codec type, gender and language) is provided. An understanding of the perceptual effects of
these key parameters on voice quality is important as it provides a basis for the development
of non-intrusive voice quality prediction models. A fundamental investigation of the impact of
the parameters on perceived voice quality was carried out using the latest ITU algorithm for
perceptual evaluation of speech quality, PESQ, and by exploiting the ITU E-model to obtain an
objective measure of voice quality.
(2) a new methodology to predict voice quality non-intrusively was developed. The method
exploits the intrusive algorithm, PESQ, and a combined PESQ/E-model structure to provide a
perceptually accurate prediction of both listening and conversational voice quality non-intrusively.
This avoids time-consuming subjective tests and so removes one of the major obstacles in the
development of models for voice quality prediction. The method is generic and as such has
wide applicability in multimedia applications. Efficient regression-based models and robust
artificial neural network-based learning models were developed for predicting voice quality
non-intrusively for VoIP applications.
(3) three applications of the new models were investigated: voice quality monitoring/prediction
for real Internet VoIP traces, perceived quality driven playout buffer optimization and
perceived quality driven QoS control. The neural network and regression models were both
used to predict voice quality for real Internet VoIP traces based on international links. A new
adaptive playout buffer and a perceptual optimization playout buffer algorithms are presented.
A QoS control scheme that combines the strengths of rate-adaptive and priority marking control
schemes to provide a superior QoS control in terms of measured perceived voice quality is
also provided.
(4) a new methodology for Internet-based subjective speech quality measurement which
allows rapid assessment of voice quality for VoIP applications is proposed and assessed using
both objective and traditional MOS test methods
Quality aspects of Internet telephony
Internet telephony has had a tremendous impact on how people communicate.
Many now maintain contact using some form of Internet telephony.
Therefore the motivation for this work has been to address the quality aspects
of real-world Internet telephony for both fixed and wireless telecommunication.
The focus has been on the quality aspects of voice communication,
since poor quality leads often to user dissatisfaction. The scope of the work
has been broad in order to address the main factors within IP-based voice
communication.
The first four chapters of this dissertation constitute the background
material. The first chapter outlines where Internet telephony is deployed
today. It also motivates the topics and techniques used in this research.
The second chapter provides the background on Internet telephony including
signalling, speech coding and voice Internetworking. The third chapter
focuses solely on quality measures for packetised voice systems and finally
the fourth chapter is devoted to the history of voice research.
The appendix of this dissertation constitutes the research contributions.
It includes an examination of the access network, focusing on how calls are
multiplexed in wired and wireless systems. Subsequently in the wireless
case, we consider how to handover calls from 802.11 networks to the cellular
infrastructure. We then consider the Internet backbone where most of our
work is devoted to measurements specifically for Internet telephony. The
applications of these measurements have been estimating telephony arrival
processes, measuring call quality, and quantifying the trend in Internet telephony
quality over several years. We also consider the end systems, since
they are responsible for reconstructing a voice stream given loss and delay
constraints. Finally we estimate voice quality using the ITU proposal PESQ
and the packet loss process.
The main contribution of this work is a systematic examination of Internet
telephony. We describe several methods to enable adaptable solutions
for maintaining consistent voice quality. We have also found that relatively
small technical changes can lead to substantial user quality improvements.
A second contribution of this work is a suite of software tools designed to
ascertain voice quality in IP networks. Some of these tools are in use within
commercial systems today
Scalable Speech Coding for IP Networks
The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss.
Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support.
This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec.
The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition
Finding perceptually optimal operating points of a real time interactive video-conferencing system
This research aims to address issues faced by real time video-conferencing systems in locating a perceptually optimal operating point under various network and conversational conditions.
In order to determine the perceptually optimal operating point of a video-conferencing system, we must first be able to conduct a fair assessment of the quality of the current operating point in the system and compare it with another operating point to determine if one is better than the other in terms of perceptual quality. However at this point in time, there does not exist one objective quality metric that can accurately and fully describe the perceptual quality of a real time video conversation. Hence there is a need for a controlled environment to allow tests to be conducted in and in which we can study different metrics and identify the best trade-offs between them.
We begin by studying the components of a typical setup of a real time video-conferencing system and the impacts that various network and conversation conditions can have on the overall perceptual quality. We also look into different metrics available to measure those impacts.
We then created a platform to perform black box testing on current video conferencing systems and observe how they handle the changes in operating conditions. The platform is then used to conduct a brief evaluation of the performance of Skype, a popular commercial video-conferencing system. However, we are not able to modify the system parameters of Skype.
The main contribution of this thesis is the design of a new testbed that provides a controlled environment to allow tests to be conducted to determine the perceptual optimum operating point of a video conversation under specified network and conversation conditions. This testbed will allow us to modify certain parameters, such as frame rate and frame size, which were not previously possible.
The testbed takes as input, two recorded videos of the two speakers of a face-to-face conversation and desired output video parameters, such as frame rate, frame size and delay. A video generation algorithm is designed as part of the testbed to handle modifications to frame rate and frame size of the videos as well as delays inserted into the recorded video conversation to simulate the effects of network delays. The most important issue addressed is the generation of new frames to fill up the gaps created due to a change in frame rate or delay inserted, unlike as in the case of voice, where a period of silence can simply be used to handle these situations.
The testbed uses a packetization strategy designed on the basis of an uneven packet transmission rate (UPTR) and that handles the packetization of interleaved video and audio data; it also uses piggybacking to provide redundancy if required. Losses can be injected either randomly or based on packet traces collected via PlanetLab. The processed videos will then be pieced together side-by-side to give the viewpoint of a third-party observing the video conversation from the site of the first speaker. Hence the first speaker will be observed to have a faster reaction time without network delays than that of the second speaker who is simulated to be located at the remote end. The video of the second speaker will also reflect the degradations in perceptual quality induced by the network conditions, whereas the first speaker will be of perfect quality. Hence with the testbed, we are able to generate output videos for different operating points under the same network and conversational conditions and thus able to make comparisons between two operating points.
With the testbed in place, we demonstrate how it can be used to evaluate the effects of various parameters on the overall perceptual quality.
Lastly, we demonstrate the results of applying an existing efficient search algorithm used for estimating the perceptually optimal mouth-to-ear delay (MED) of a Voice-over-IP(VoIP) conversation to a Video Conversation. This is achieved by using the network simulator designed to conduct a series of subjective and objective tests to identify the perceptual optimum MED under specific network and conversational conditions
- …