181 research outputs found

    Bilateral Waveform Similarity Overlap-and-Add Based Packet Loss Concealment for Voice over IP

    Get PDF
    This paper invested a bilateral waveform similarity overlap-and-add algorithm for voice packet lost. Since Packet lost will cause the semantic misunderstanding, it has become one of the most essential problems in speech communication. This investment is based on waveform similarity measure using overlap-and-Add algorithm and provides the bilateral information to enhance the speech signal reconstruction. Traditionally, it has been improved that waveform similarity overlap-and-add (WSOLA) technique is an effective algorithm to deal with packet loss concealment (PLC) for real-time time communication. WSOLA algorithm is widely applied to deal with the length adaptation and packet loss concealment of speech signal. Time scale modification of audio signal is one of the most essential research topics in data communication, especially in voice of IP (VoIP). Herein, the proposed the bilateral WSOLA (BWSOLA) that is derived from WSOLA. Instead of only exploitation one direction speech data, the proposed method will reconstruct the lost voice data according to the preceding and cascading data. The related algorithms have been developed to achieve the optimal reconstructing estimation. The experimental results show that the quality of the reconstructed speech signal of the bilateral WSOLA is much better compared to the standard WSOLA and GWSOLA on different packet loss rate and length using the metrics PESQ and MOS. The significant improvement is obtained by bilateral information and proposed method. The proposed bilateral waveform similarity overlap-and-add (BWSOLA) outperforms the traditional approaches especially in the long duration data loss

    Speech quality prediction for voice over Internet protocol networks

    Get PDF
    Merged with duplicate record 10026.1/878 on 03.01.2017 by CS (TIS). Merged with duplicate record 10026.1/1657 on 15.03.2017 by CS (TIS)This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.IP networks are on a steep slope of innovation that will make them the long-term carrier of all types of traffic, including voice. However, such networks are not designed to support real-time voice communication because their variable characteristics (e.g. due to delay, delay variation and packet loss) lead to a deterioration in voice quality. A major challenge in such networks is how to measure or predict voice quality accurately and efficiently for QoS monitoring and/or control purposes to ensure that technical and commercial requirements are met. Voice quality can be measured using either subjective or objective methods. Subjective measurement (e.g. MOS) is the benchmark for objective methods, but it is slow, time consuming and expensive. Objective measurement can be intrusive or non-intrusive. Intrusive methods (e.g. ITU PESQ) are more accurate, but normally are unsuitable for monitoring live traffic because of the need for a reference data and to utilise the network. This makes non-intrusive methods(e.g. ITU E-model) more attractive for monitoring voice quality from IP network impairments. However, current non-intrusive methods rely on subjective tests to derive model parameters and as a result are limited and do not meet new and emerging applications. The main goal of the project is to develop novel and efficient models for non-intrusive speech quality prediction to overcome the disadvantages of current subjective-based methods and to demonstrate their usefulness in new and emerging VoIP applications. The main contributions of the thesis are fourfold: (1) a detailed understanding of the relationships between voice quality, IP network impairments (e.g. packet loss, jitter and delay) and relevant parameters associated with speech (e.g. codec type, gender and language) is provided. An understanding of the perceptual effects of these key parameters on voice quality is important as it provides a basis for the development of non-intrusive voice quality prediction models. A fundamental investigation of the impact of the parameters on perceived voice quality was carried out using the latest ITU algorithm for perceptual evaluation of speech quality, PESQ, and by exploiting the ITU E-model to obtain an objective measure of voice quality. (2) a new methodology to predict voice quality non-intrusively was developed. The method exploits the intrusive algorithm, PESQ, and a combined PESQ/E-model structure to provide a perceptually accurate prediction of both listening and conversational voice quality non-intrusively. This avoids time-consuming subjective tests and so removes one of the major obstacles in the development of models for voice quality prediction. The method is generic and as such has wide applicability in multimedia applications. Efficient regression-based models and robust artificial neural network-based learning models were developed for predicting voice quality non-intrusively for VoIP applications. (3) three applications of the new models were investigated: voice quality monitoring/prediction for real Internet VoIP traces, perceived quality driven playout buffer optimization and perceived quality driven QoS control. The neural network and regression models were both used to predict voice quality for real Internet VoIP traces based on international links. A new adaptive playout buffer and a perceptual optimization playout buffer algorithms are presented. A QoS control scheme that combines the strengths of rate-adaptive and priority marking control schemes to provide a superior QoS control in terms of measured perceived voice quality is also provided. (4) a new methodology for Internet-based subjective speech quality measurement which allows rapid assessment of voice quality for VoIP applications is proposed and assessed using both objective and traditional MOS test methods

    Quality aspects of Internet telephony

    Get PDF
    Internet telephony has had a tremendous impact on how people communicate. Many now maintain contact using some form of Internet telephony. Therefore the motivation for this work has been to address the quality aspects of real-world Internet telephony for both fixed and wireless telecommunication. The focus has been on the quality aspects of voice communication, since poor quality leads often to user dissatisfaction. The scope of the work has been broad in order to address the main factors within IP-based voice communication. The first four chapters of this dissertation constitute the background material. The first chapter outlines where Internet telephony is deployed today. It also motivates the topics and techniques used in this research. The second chapter provides the background on Internet telephony including signalling, speech coding and voice Internetworking. The third chapter focuses solely on quality measures for packetised voice systems and finally the fourth chapter is devoted to the history of voice research. The appendix of this dissertation constitutes the research contributions. It includes an examination of the access network, focusing on how calls are multiplexed in wired and wireless systems. Subsequently in the wireless case, we consider how to handover calls from 802.11 networks to the cellular infrastructure. We then consider the Internet backbone where most of our work is devoted to measurements specifically for Internet telephony. The applications of these measurements have been estimating telephony arrival processes, measuring call quality, and quantifying the trend in Internet telephony quality over several years. We also consider the end systems, since they are responsible for reconstructing a voice stream given loss and delay constraints. Finally we estimate voice quality using the ITU proposal PESQ and the packet loss process. The main contribution of this work is a systematic examination of Internet telephony. We describe several methods to enable adaptable solutions for maintaining consistent voice quality. We have also found that relatively small technical changes can lead to substantial user quality improvements. A second contribution of this work is a suite of software tools designed to ascertain voice quality in IP networks. Some of these tools are in use within commercial systems today

    Scalable Speech Coding for IP Networks

    Get PDF
    The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss. Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support. This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec. The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition

    Finding perceptually optimal operating points of a real time interactive video-conferencing system

    Get PDF
    This research aims to address issues faced by real time video-conferencing systems in locating a perceptually optimal operating point under various network and conversational conditions. In order to determine the perceptually optimal operating point of a video-conferencing system, we must first be able to conduct a fair assessment of the quality of the current operating point in the system and compare it with another operating point to determine if one is better than the other in terms of perceptual quality. However at this point in time, there does not exist one objective quality metric that can accurately and fully describe the perceptual quality of a real time video conversation. Hence there is a need for a controlled environment to allow tests to be conducted in and in which we can study different metrics and identify the best trade-offs between them. We begin by studying the components of a typical setup of a real time video-conferencing system and the impacts that various network and conversation conditions can have on the overall perceptual quality. We also look into different metrics available to measure those impacts. We then created a platform to perform black box testing on current video conferencing systems and observe how they handle the changes in operating conditions. The platform is then used to conduct a brief evaluation of the performance of Skype, a popular commercial video-conferencing system. However, we are not able to modify the system parameters of Skype. The main contribution of this thesis is the design of a new testbed that provides a controlled environment to allow tests to be conducted to determine the perceptual optimum operating point of a video conversation under specified network and conversation conditions. This testbed will allow us to modify certain parameters, such as frame rate and frame size, which were not previously possible. The testbed takes as input, two recorded videos of the two speakers of a face-to-face conversation and desired output video parameters, such as frame rate, frame size and delay. A video generation algorithm is designed as part of the testbed to handle modifications to frame rate and frame size of the videos as well as delays inserted into the recorded video conversation to simulate the effects of network delays. The most important issue addressed is the generation of new frames to fill up the gaps created due to a change in frame rate or delay inserted, unlike as in the case of voice, where a period of silence can simply be used to handle these situations. The testbed uses a packetization strategy designed on the basis of an uneven packet transmission rate (UPTR) and that handles the packetization of interleaved video and audio data; it also uses piggybacking to provide redundancy if required. Losses can be injected either randomly or based on packet traces collected via PlanetLab. The processed videos will then be pieced together side-by-side to give the viewpoint of a third-party observing the video conversation from the site of the first speaker. Hence the first speaker will be observed to have a faster reaction time without network delays than that of the second speaker who is simulated to be located at the remote end. The video of the second speaker will also reflect the degradations in perceptual quality induced by the network conditions, whereas the first speaker will be of perfect quality. Hence with the testbed, we are able to generate output videos for different operating points under the same network and conversational conditions and thus able to make comparisons between two operating points. With the testbed in place, we demonstrate how it can be used to evaluate the effects of various parameters on the overall perceptual quality. Lastly, we demonstrate the results of applying an existing efficient search algorithm used for estimating the perceptually optimal mouth-to-ear delay (MED) of a Voice-over-IP(VoIP) conversation to a Video Conversation. This is achieved by using the network simulator designed to conduct a series of subjective and objective tests to identify the perceptual optimum MED under specific network and conversational conditions
    • …
    corecore