    Enhancement of perceived quality of service for voice over internet protocol systems

    Voice over Internet Protocol (WIP) applications are becoming more and more popular in the telecommunication market. Packet switched V61P systems have many technical advantages over conventional Public Switched Telephone Network (PSTN), including its efficient and flexible use of the bandwidth, lower cost and enhanced security. However, due to the IP network's "Best Effort" nature, voice quality are not naturally guaranteed in the VoIP services. In fact, most current Vol]P services can not provide as good a voice quality as PSTN. IP Network impairments such as packet loss, delay and jitter affect perceived speech quality as do application layer impairment factors, such as codec rate and audio features. Current perceived Quality of Service (QoS) methods are mainly designed to be used in a PSTN/TDM environment and their performance in V6IP environment is unknown. It is a challenge to measure perceived speech quality correctly in V61P system and to enhance user perceived speech quality for VoIP system. The main goal of this project is to evaluate the accuracy of the existing ITU-T speech quality measurement method (Perceptual Evaluation of Speech Quality - PESQ) in mobile wireless systems in the context of V61P, and to develop novel and efficient methods to enhance the user perceived speech quality for emerging V61P services especially in mobile V61P environment. The main contributions of the thesis are threefold: (1) A new discovery of PESQ errors in mobile VoIP environment. A detailed investigation of PESQ performance in mobile VoIP environment was undertaken and included setting up a PESQ performance evaluation platform and testing over 1800 mobile-to-mobile and mobileto- PSTN calls over a period of three months. The accuracy issues of PESQ algorithm was investigated and main problems causing inaccurate PESQ score (improper time-alignment in the PESQ algorithm) were discovered . Calibration issues for a safe and proper PESQ testing in mobile environment were also discussed in the thesis. (2) A new, simple-to-use, V611Pjit ter buffer algorithm. This was developed and implemented in a commercial mobile handset. The algorithm, called "Play Late Algorithm", adaptively alters the playout delay inside a speech talkspurt without introducing unnecessary extra end-to-end delay. It can be used as a front-end to conventional static or adaptive jitter buffer algorithms to provide improved performance. Results show that the proposed algorithm can increase user perceived quality without consuming too much processing power when tested in live wireless VbIP networks. (3) A new QoS enhancement scheme. The new scheme combines the strengths of adaptive codec bit rate (i. e. AMR 8-modes bit rate) and speech priority marking (i. e. giving high priority for the beginning of a voiced segment). The results gathered on a simulation and emulation test platform shows that the combined method provides a better user perceived speech quality than separate adaptive sender bit rate or packet priority marking methods

    Speech quality prediction for voice over Internet protocol networks

    Merged with duplicate record 10026.1/878 on 03.01.2017 by CS (TIS). Merged with duplicate record 10026.1/1657 on 15.03.2017 by CS (TIS)This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.IP networks are on a steep slope of innovation that will make them the long-term carrier of all types of traffic, including voice. However, such networks are not designed to support real-time voice communication because their variable characteristics (e.g. due to delay, delay variation and packet loss) lead to a deterioration in voice quality. A major challenge in such networks is how to measure or predict voice quality accurately and efficiently for QoS monitoring and/or control purposes to ensure that technical and commercial requirements are met. Voice quality can be measured using either subjective or objective methods. Subjective measurement (e.g. MOS) is the benchmark for objective methods, but it is slow, time consuming and expensive. Objective measurement can be intrusive or non-intrusive. Intrusive methods (e.g. ITU PESQ) are more accurate, but normally are unsuitable for monitoring live traffic because of the need for a reference data and to utilise the network. This makes non-intrusive methods(e.g. ITU E-model) more attractive for monitoring voice quality from IP network impairments. However, current non-intrusive methods rely on subjective tests to derive model parameters and as a result are limited and do not meet new and emerging applications. The main goal of the project is to develop novel and efficient models for non-intrusive speech quality prediction to overcome the disadvantages of current subjective-based methods and to demonstrate their usefulness in new and emerging VoIP applications. The main contributions of the thesis are fourfold: (1) a detailed understanding of the relationships between voice quality, IP network impairments (e.g. packet loss, jitter and delay) and relevant parameters associated with speech (e.g. codec type, gender and language) is provided. An understanding of the perceptual effects of these key parameters on voice quality is important as it provides a basis for the development of non-intrusive voice quality prediction models. A fundamental investigation of the impact of the parameters on perceived voice quality was carried out using the latest ITU algorithm for perceptual evaluation of speech quality, PESQ, and by exploiting the ITU E-model to obtain an objective measure of voice quality. (2) a new methodology to predict voice quality non-intrusively was developed. The method exploits the intrusive algorithm, PESQ, and a combined PESQ/E-model structure to provide a perceptually accurate prediction of both listening and conversational voice quality non-intrusively. This avoids time-consuming subjective tests and so removes one of the major obstacles in the development of models for voice quality prediction. The method is generic and as such has wide applicability in multimedia applications. Efficient regression-based models and robust artificial neural network-based learning models were developed for predicting voice quality non-intrusively for VoIP applications. (3) three applications of the new models were investigated: voice quality monitoring/prediction for real Internet VoIP traces, perceived quality driven playout buffer optimization and perceived quality driven QoS control. The neural network and regression models were both used to predict voice quality for real Internet VoIP traces based on international links. A new adaptive playout buffer and a perceptual optimization playout buffer algorithms are presented. A QoS control scheme that combines the strengths of rate-adaptive and priority marking control schemes to provide a superior QoS control in terms of measured perceived voice quality is also provided. (4) a new methodology for Internet-based subjective speech quality measurement which allows rapid assessment of voice quality for VoIP applications is proposed and assessed using both objective and traditional MOS test methods


    The quality of VoIP communication relies significantly on the network that transports the voice packets because this network does not usually guarantee the available bandwidth, delay, and loss that are critical for real-time voice traffic. The solution proposed here is to manage the voice-over-IP stream dynamically, changing parameters as needed to assure quality. The main objective of this dissertation is to develop an adaptive speech encoding system that can be applied to conventional (telephony-grade) and wideband voice communications. This comprehensive study includes the investigation and development of three key components of the system. First, to manage VoIP quality dynamically, a tool is needed to measure real-time changes in quality. The E-model, which exists for narrowband communication, is extended to a single computational technique that measures speech quality for narrowband and wideband VoIP codecs. This part of the dissertation also develops important theoretical work in the area of wideband telephony. The second system component is a variable speech-encoding algorithm. Although VoIP performance is affected by multiple codecs and network-based factors, only three factors can be managed dynamically: voice payload size, speech compression and jitter buffer management. Using an existing adaptive jitter-buffer algorithm, voice packet-size and compression variation are studied as they affect speech quality under different network conditions. This study explains the relationships among multiple parameters as they affect speech transmission and its resulting quality. Then, based on these two components, the third system component is a novel adaptive-rate control algorithm that establishes the interaction between a VoIP sender and receiver, and manages voice quality in real-time. Simulations demonstrate that the system provides better average voice quality than traditional VoIP

    Speaker Recognition in the VoIP Environment

    Tato práce popisuje použití systémů pro rozpoznávání mluvčího v~prostředí VoIP, úspěšnost systému a přístupy k jejímu zlepšení. Popisuje architekturu těchto systémů, metriky pro vyhodnocení jejich úspěšnosti a klíčové komponenty VoIP z hlediska rozpoznávání mluvčího. Je zde popsáno vytvoření simulace VoIP prostředí, úspěšnost systému je vyhodnocena na datech pocházejících z různých druhů VoIP prostředí a výsledky jsou demostrovány. Adaptace a kalibrace systému je provedena a jejich přínosy zhodnoceny.This work describes using speaker recognition systems in the VoIP environment, system performance and approaches to improving it. System architecture, evaluation metrics and VoIP technology key components from the view of speaker recognition are described. VoIP environment simulation is described. Speaker recognition system's performance is evaluated on data sets from various kinds of VoIP environments and the results are demonstrated. System adaptation and calibration is performed and their benefits are discussed.

    The Bits of Silence : Redundant Traffic in VoIP

    Human conversation is characterized by brief pauses and so-called turn-taking behavior between the speakers. In the context of VoIP, this means that there are frequent periods where the microphone captures only background noise – or even silence whenever the microphone is muted. The bits transmitted from such silence periods introduce overhead in terms of data usage, energy consumption, and network infrastructure costs. In this paper, we contribute by shedding light on these costs for VoIP applications. We systematically measure the performance of six popular mobile VoIP applications with controlled human conversation and acoustic setup. Our analysis demonstrates that significant savings can indeed be achievable - with the best performing silence suppression technique being effective on 75% of silent pauses in the conversation in a quiet place. This results in 2-5 times data savings, and 50-90% lower energy consumption compared to the next better alternative. Even then, the effectiveness of silence suppression can be sensitive to the amount of background noise, underlying speech codec, and the device being used. The codec characteristics and performance do not depend on the network type. However, silence suppression makes VoIP traffic network friendly as much as VoLTE traffic. Our results provide new insights into VoIP performance and offer a motivation for further enhancements, such as performance-aware codec selection, that can significantly benefit a wide variety of voice assisted applications, as such intelligent home assistants and other speech codec enabled IoT devices.Peer reviewe


    “Voice over Internet Protocol (VoIP)” is a popular communication technology that plays a vital role in term of cost reduction and flexibility. However, like any emerging technology, there are still some issues with VoIP, namely providing good Quality of Service (QoS), capacity consideration and providing security. This study focuses on the QoS issue of VoIP, specifically in “Wireless Local Area Networks (WLAN)”. IEEE 802.11 is the most popular standard of wireless LANs and it offers different transmission rates for wireless channels. Different transmission rates are associated with varying available bandwidth that shall influence the transmission of VoIP traffic

    Analysis of Connectivity Model and Encoding Standards on IP Interconnection Implementation in Indonesia (Study Case: Low Data Rate up to 72 Mbps)

    Saat ini Indonesia dihadapkan pada permasalahan dimana lalu lintas data, termasuk OTT di dalamnya, mendominasi layanan telekomunikasi yang menyebabkan pendapatan interkoneksi semakin menurun. Padahal, biaya pemeliharaan jaringan cenderung naik. Kemunculan teknologi IP dapat memberikan keuntungan, baik terhadap Operator dalam scissor effect maupun menaikkan tingkat loyalitas pelanggannya. Namun, saat ini regulasi Interkoneksi  di Indonesia masih menggunakan Time Division Multiplexing (TDM). Oleh karena itu, diperlukan suatu rekomendasi mengenai standarisasi pengkodean dan model interkoneksi IP. Dalam penelitian ini, aspek teknis dari model interkoneksi IP dianalisis dengan menggunakan perbandingan model, yaitu Peering dan Hubbing dengan metode no-transcoding pada 6 jenis codec(G.711a, G.711u, GSM, G.723, G.729, dan G.722) dengan pemberian berbagai beban trafik, (0 Mbps, 15 Mbps, 40 Mbps, dan 72 Mbps). Hasil performansi QoS berupa delay, Mean Opinion Score, packet loss, dan throughput yang diperoleh dari hasil simulasi masing-masing model dan kombinasi codec dianalisis dengan  menggunakan server VOIP Asterisk 11 dan Microsip 3.17.3 untuk SIP phone juga Wireshark 2.2.4 dianalisis untuk mengetahui performansinya. Nilai one way delay QoS mengacu pada standar nilai pada ITU-T G.1010. Dari hasil simulasi diperoleh bahwa secara keseluruhan dengan beban trafik sampai 72 Mbps, model Peering merupakan alternatif model interkoneksi IP yang terbaik. Selain itu, penggunaan codec G729 menghasilkan performansi paling baik dengan nilai delay paling minimum dan MOS paling besar, sehingga paling direkomendasikan untuk digunakan dalam implementasi interkoneksi IP. *****Currently, Indonesia is faced with problems where data traffic including OTT dominates the telecommunications services lead to interconnection revenue declining. In the other hand, the cost of network maintenance tend to increase. The emergence of IP technology may provide benefit to the operators in handling the scissor effect and improving the level of customer’s loyalty. However, the current interconnection regulations  in Indonesia are still using TDM. Therefore, a recommendation on standardization of IP encoding and interconnection model is required. In this research, technical aspect analysis of IP interconnect model is analyzed using comparison model, that is Peering and Hubbing with no-transcoding method on 6 types of codec (G.711a, G.711u, GSM, G.723, G.729, G.722) and loading of various traffic loads (0 Mbps, 15 Mbps, 40 Mbps, 72 Mbps). The results of QoS performance (delay, Mean Opinion Score, packet loss, throughput) obtained from the simulation results of each model and combination of codec are analyzed using VOIP server Asterisk 11 and Microsip 3.17.3 for SIP phone also Wireshark 2.2.4 to assess the performance. One-way delay QoS value refers to the standard in ITU-T G.1010. From the simulation results, it is obtained that for overall traffic load up to 72 Mbps, Peering model is the best alternative IP interconnect model. The usage of G.729 codec was the best performance codec with the minimum delay value and the biggest MOS, thus it was the most recommended for used in the IP interconnection implementation

