1,635 research outputs found
QoE Modelling, Measurement and Prediction: A Review
In mobile computing systems, users can access network services anywhere and
anytime using mobile devices such as tablets and smart phones. These devices
connect to the Internet via network or telecommunications operators. Users
usually have some expectations about the services provided to them by different
operators. Users' expectations along with additional factors such as cognitive
and behavioural states, cost, and network quality of service (QoS) may
determine their quality of experience (QoE). If users are not satisfied with
their QoE, they may switch to different providers or may stop using a
particular application or service. Thus, QoE measurement and prediction
techniques may benefit users in availing personalized services from service
providers. On the other hand, it can help service providers to achieve lower
user-operator switchover. This paper presents a review of the state-the-art
research in the area of QoE modelling, measurement and prediction. In
particular, we investigate and discuss the strengths and shortcomings of
existing techniques. Finally, we present future research directions for
developing novel QoE measurement and prediction technique
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web
The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.Publicad
A Comparison of Front-Ends for Bitstream-Based ASR over IP
Automatic speech recognition (ASR) is called to play a relevant role in the provision of spoken interfaces for IP-based applications. However, as a consequence of the transit of the speech signal over these particular networks, ASR systems need to face two new challenges: the impoverishment of the speech quality due to the compression needed to fit the channel capacity and the inevitable occurrence of packet losses.
In this framework, bitstream-based approaches that obtain the ASR feature vectors directly from the coded bitstream, avoiding the speech decoding process, have been proposed ([S.H. Choi, H.K. Kim, H.S. Lee, Speech recognition using quantized LSP parameters and their transformations in digital communications, Speech Commun. 30 (4) (2000) 223â233. A. Gallardo-AntolĂn, C. PelĂ ez-Moreno, F. DĂaz-de-MarĂa, Recognizing GSM digital speech, IEEE Trans. Speech Audio Process., to appear. H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments, IEEE Trans. Speech Audio Process. 10 (8) (2002) 591â604. C. PelĂĄez-Moreno, A. Gallardo-AntolĂn, F. DĂaz-de-MarĂa, Recognizing voice over IP networks: a robust front-end for speech recognition on the WWW, IEEE Trans. Multimedia 3(2) (2001) 209â218], among others) to improve the robustness of ASR systems. LSP (Line Spectral Pairs) are the preferred set of parameters for the description of the speech spectral envelope in most of the modern speech coders. Nevertheless, LSP have proved to be unsuitable for ASR, and they must be transformed into cepstrum-type parameters. In this paper we comparatively evaluate the robustness of the most significant LSP to cepstrum transformations in a simulated VoIP (voice over IP) environment which includes two of the most popular codecs used in that network (G.723.1 and G.729) and several network conditions. In particular, we compare âpseudocepstrumâ [H.K. Kim, S.H. Choi, H.S. Lee, On approximating Line Spectral Frequencies to LPC cepstral coefficients, IEEE Trans. Speech Audio Process. 8 (2) (2000) 195â199], an approximated but straightforward transformation of LSP into LP cepstral coefficients, with a more computationally demanding but exact one. Our results show that pseudocepstrum is preferable when network conditions are good or computational resources low, while the exact procedure is recommended when network conditions become more adverse.Publicad
Covert voice over internet protocol communications with packet loss based on fractal interpolation
The last few years have witnessed an explosive growth in the research of information hiding in multimedia objects, but few studies have taken into account packet loss in multimedia networks. As one of the most popular real-time services in the Internet, Voice over Internet Protocol (VoIP) contributes to a large part of network traffic for its advantages of real time, high flow, and low cost. So packet loss is inevitable in multimedia networks and affects the performance of VoIP communications. In this study, a fractal-based VoIP steganographic approach was proposed to realise covert VoIP communications in the presence of packet loss. In the proposed scheme, secret data to be hidden were divided into blocks after being encrypted with the block cipher, and each block of the secret data was then embedded into VoIP streaming packets. The VoIP packets went through a packet loss system based on Gilbert model which simulates a real network situation. And a prediction model based on fractal interpolation was built to decide whether a VoIP packet was suitable for data hiding. The experimental results indicated that the speech quality degradation increased with the escalating packet-loss level. The average variance of speech quality metrics (PESQ score) between the "no-embedding" speech samples and the âwith-embeddingâ stego-speech samples was about 0.717, and the variances narrowed with the increasing packet-loss level. Both the average PESQ scores and the SNR values of stego-speech samples and the data retrieving rates had almost the same varying trends when the packet-loss level increased, indicating that the success rate of the fractal prediction model played an important role in the performance of covert VoIP communications
Speech quality prediction for voice over Internet protocol networks
Merged with duplicate record 10026.1/878 on 03.01.2017 by CS (TIS). Merged with duplicate record 10026.1/1657 on 15.03.2017 by CS (TIS)This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.IP networks are on a steep slope of innovation that will make them the long-term carrier
of all types of traffic, including voice. However, such networks are not designed to support
real-time voice communication because their variable characteristics (e.g. due to delay, delay
variation and packet loss) lead to a deterioration in voice quality. A major challenge in such networks
is how to measure or predict voice quality accurately and efficiently for QoS monitoring
and/or control purposes to ensure that technical and commercial requirements are met.
Voice quality can be measured using either subjective or objective methods. Subjective
measurement (e.g. MOS) is the benchmark for objective methods, but it is slow, time consuming
and expensive. Objective measurement can be intrusive or non-intrusive. Intrusive methods
(e.g. ITU PESQ) are more accurate, but normally are unsuitable for monitoring live traffic
because of the need for a reference data and to utilise the network. This makes non-intrusive
methods(e.g. ITU E-model) more attractive for monitoring voice quality from IP network impairments.
However, current non-intrusive methods rely on subjective tests to derive model
parameters and as a result are limited and do not meet new and emerging applications.
The main goal of the project is to develop novel and efficient models for non-intrusive
speech quality prediction to overcome the disadvantages of current subjective-based methods
and to demonstrate their usefulness in new and emerging VoIP applications. The main contributions
of the thesis are fourfold:
(1) a detailed understanding of the relationships between voice quality, IP network impairments
(e.g. packet loss, jitter and delay) and relevant parameters associated with speech (e.g.
codec type, gender and language) is provided. An understanding of the perceptual effects of
these key parameters on voice quality is important as it provides a basis for the development
of non-intrusive voice quality prediction models. A fundamental investigation of the impact of
the parameters on perceived voice quality was carried out using the latest ITU algorithm for
perceptual evaluation of speech quality, PESQ, and by exploiting the ITU E-model to obtain an
objective measure of voice quality.
(2) a new methodology to predict voice quality non-intrusively was developed. The method
exploits the intrusive algorithm, PESQ, and a combined PESQ/E-model structure to provide a
perceptually accurate prediction of both listening and conversational voice quality non-intrusively.
This avoids time-consuming subjective tests and so removes one of the major obstacles in the
development of models for voice quality prediction. The method is generic and as such has
wide applicability in multimedia applications. Efficient regression-based models and robust
artificial neural network-based learning models were developed for predicting voice quality
non-intrusively for VoIP applications.
(3) three applications of the new models were investigated: voice quality monitoring/prediction
for real Internet VoIP traces, perceived quality driven playout buffer optimization and
perceived quality driven QoS control. The neural network and regression models were both
used to predict voice quality for real Internet VoIP traces based on international links. A new
adaptive playout buffer and a perceptual optimization playout buffer algorithms are presented.
A QoS control scheme that combines the strengths of rate-adaptive and priority marking control
schemes to provide a superior QoS control in terms of measured perceived voice quality is
also provided.
(4) a new methodology for Internet-based subjective speech quality measurement which
allows rapid assessment of voice quality for VoIP applications is proposed and assessed using
both objective and traditional MOS test methods
Measuring and Monitoring Speech Quality for Voice over IP with POLQA, ViSQOL and P.563
There are many types of degradation which can occur in Voice over IP (VoIP) calls. Of interest in this work are degradations which occur independently of the codec, hardware or network in use. Specifically, their effect on the subjective and objec- tive quality of the speech is examined. Since no dataset suit- able for this purpose exists, a new dataset (TCD-VoIP) has been created and has been made publicly available. The dataset con- tains speech clips suffering from a range of common call qual- ity degradations, as well as a set of subjective opinion scores on the clips from 24 listeners. The performances of three ob- jective quality metrics: POLQA, ViSQOL and P.563, have been evaluated using the dataset. The results show that full reference metrics are capable of accurately predicting a variety of com- mon VoIP degradations. They also highlight the outstanding need for a wideband, single-ended, no-reference metric to mon- itor accurately speech quality for degradations common in VoIP scenarios
- âŚ