256 research outputs found

    VoIP Quality Assessment Technologies

    Get PDF

    Quality of media traffic over Lossy internet protocol networks: Measurement and improvement.

    Get PDF
    Voice over Internet Protocol (VoIP) is an active area of research in the world of communication. The high revenue made by the telecommunication companies is a motivation to develop solutions that transmit voice over other media rather than the traditional, circuit switching network. However, while IP networks can carry data traffic very well due to their besteffort nature, they are not designed to carry real-time applications such as voice. As such several degradations can happen to the speech signal before it reaches its destination. Therefore, it is important for legal, commercial, and technical reasons to measure the quality of VoIP applications accurately and non-intrusively. Several methods were proposed to measure the speech quality: some of these methods are subjective, others are intrusive-based while others are non-intrusive. One of the non-intrusive methods for measuring the speech quality is the E-model standardised by the International Telecommunication Union-Telecommunication Standardisation Sector (ITU-T). Although the E-model is a non-intrusive method for measuring the speech quality, but it depends on the time-consuming, expensive and hard to conduct subjective tests to calibrate its parameters, consequently it is applicable to a limited number of conditions and speech coders. Also, it is less accurate than the intrusive methods such as Perceptual Evaluation of Speech Quality (PESQ) because it does not consider the contents of the received signal. In this thesis an approach to extend the E-model based on PESQ is proposed. Using this method the E-model can be extended to new network conditions and applied to new speech coders without the need for the subjective tests. The modified E-model calibrated using PESQ is compared with the E-model calibrated using i ii subjective tests to prove its effectiveness. During the above extension the relation between quality estimation using the E-model and PESQ is investigated and a correction formula is proposed to correct the deviation in speech quality estimation. Another extension to the E-model to improve its accuracy in comparison with the PESQ looks into the content of the degraded signal and classifies packet loss into either Voiced or Unvoiced based on the received surrounding packets. The accuracy of the proposed method is evaluated by comparing the estimation of the new method that takes packet class into consideration with the measurement provided by PESQ as a more accurate, intrusive method for measuring the speech quality. The above two extensions for quality estimation of the E-model are combined to offer a method for estimating the quality of VoIP applications accurately, nonintrusively without the need for the time-consuming, expensive, and hard to conduct subjective tests. Finally, the applicability of the E-model or the modified E-model in measuring the quality of services in Service Oriented Computing (SOC) is illustrated

    Voice over IP

    Get PDF
    The area that this thesis covers is Voice over IP (or IP Telephony as it is sometimes called) over Private networks and not over the Internet. There is a distinction to be made between the two even though the term is loosely applied to both. IP Telephony over Private Networks involve calls made over private WANs using IP telephony protocols while IP Telephony over the Internet involve calls made over the public Internet using IP telephony protocols. Since the network is private, service is reliable because the network owner can control how resources are allocated to various applications, such as telephony services. The public Internet on the other hand is a public, largely unmanaged network that offers no reliable service guarantee. Calls placed over the Internet can be low in quality, but given the low price, some find this solution attractive. What started off as an Internet Revolution with free phone calls being offered to the general public using their multimedia computers has turned into a telecommunication revolution where enterprises are beginning to converge their data and voice networks into one network. In retrospect, an enterprise\u27s data networks are being leveraged for telephony. The communication industry has come full circle. Earlier in the decade data was being transmitted over the public voice networks and now voice is just another application which is/will be run over the enterprises existing data networks. We shall see in this thesis the problems that are encountered while sending Voice over Data networks using the underlying IP Protocol and the corrective steps taken by the Industry to resolve these multitudes of issues. Paul M. Zam who is collaborating in this Joint Thesis/project on VoIP will substantiate this theoretical research with his practical findings. On reading this paper the reader will gain an insight in the issues revolving the implementation of VoIP in an enterprises private network as well the technical data, which sheds more light on the same. Thus the premise of this joint thesis/project is to analyze the current status of the technology and present a business case scenario where an organization will be able to use this information

    Perceptual techniques in audio quality assessment

    Get PDF

    Apprentissage automatique pour le codage cognitif de la parole

    Get PDF
    Depuis les années 80, les codecs vocaux reposent sur des stratégies de codage à court terme qui fonctionnent au niveau de la sous-trame ou de la trame (généralement 5 à 20 ms). Les chercheurs ont essentiellement ajusté et combiné un nombre limité de technologies disponibles (transformation, prédiction linéaire, quantification) et de stratégies (suivi de forme d'onde, mise en forme du bruit) pour construire des architectures de codage de plus en plus complexes. Dans cette thèse, plutôt que de s'appuyer sur des stratégies de codage à court terme, nous développons un cadre alternatif pour la compression de la parole en codant les attributs de la parole qui sont des caractéristiques perceptuellement importantes des signaux vocaux. Afin d'atteindre cet objectif, nous résolvons trois problèmes de complexité croissante, à savoir la classification, la prédiction et l'apprentissage des représentations. La classification est un élément courant dans les conceptions de codecs modernes. Dans un premier temps, nous concevons un classifieur pour identifier les émotions, qui sont parmi les attributs à long terme les plus complexes de la parole. Dans une deuxième étape, nous concevons un prédicteur d'échantillon de parole, qui est un autre élément commun dans les conceptions de codecs modernes, pour mettre en évidence les avantages du traitement du signal de parole à long terme et non linéaire. Ensuite, nous explorons les variables latentes, un espace de représentations de la parole, pour coder les attributs de la parole à court et à long terme. Enfin, nous proposons un réseau décodeur pour synthétiser les signaux de parole à partir de ces représentations, ce qui constitue notre dernière étape vers la construction d'une méthode complète de compression de la parole basée sur l'apprentissage automatique de bout en bout. Bien que chaque étape de développement proposée dans cette thèse puisse faire partie d'un codec à elle seule, chaque étape fournit également des informations et une base pour la prochaine étape de développement jusqu'à ce qu'un codec entièrement basé sur l'apprentissage automatique soit atteint. Les deux premières étapes, la classification et la prédiction, fournissent de nouveaux outils qui pourraient remplacer et améliorer des éléments des codecs existants. Dans la première étape, nous utilisons une combinaison de modèle source-filtre et de machine à état liquide (LSM), pour démontrer que les caractéristiques liées aux émotions peuvent être facilement extraites et classées à l'aide d'un simple classificateur. Dans la deuxième étape, un seul réseau de bout en bout utilisant une longue mémoire à court terme (LSTM) est utilisé pour produire des trames vocales avec une qualité subjective élevée pour les applications de masquage de perte de paquets (PLC). Dans les dernières étapes, nous nous appuyons sur les résultats des étapes précédentes pour concevoir un codec entièrement basé sur l'apprentissage automatique. un réseau d'encodage, formulé à l'aide d'un réseau neuronal profond (DNN) et entraîné sur plusieurs bases de données publiques, extrait et encode les représentations de la parole en utilisant la prédiction dans un espace latent. Une approche d'apprentissage non supervisé basée sur plusieurs principes de cognition est proposée pour extraire des représentations à partir de trames de parole courtes et longues en utilisant l'information mutuelle et la perte contrastive. La capacité de ces représentations apprises à capturer divers attributs de la parole à court et à long terme est démontrée. Enfin, une structure de décodage est proposée pour synthétiser des signaux de parole à partir de ces représentations. L'entraînement contradictoire est utilisé comme une approximation des mesures subjectives de la qualité de la parole afin de synthétiser des échantillons de parole à consonance naturelle. La haute qualité perceptuelle de la parole synthétisée ainsi obtenue prouve que les représentations extraites sont efficaces pour préserver toutes sortes d'attributs de la parole et donc qu'une méthode de compression complète est démontrée avec l'approche proposée.Abstract: Since the 80s, speech codecs have relied on short-term coding strategies that operate at the subframe or frame level (typically 5 to 20ms). Researchers essentially adjusted and combined a limited number of available technologies (transform, linear prediction, quantization) and strategies (waveform matching, noise shaping) to build increasingly complex coding architectures. In this thesis, rather than relying on short-term coding strategies, we develop an alternative framework for speech compression by encoding speech attributes that are perceptually important characteristics of speech signals. In order to achieve this objective, we solve three problems of increasing complexity, namely classification, prediction and representation learning. Classification is a common element in modern codec designs. In a first step, we design a classifier to identify emotions, which are among the most complex long-term speech attributes. In a second step, we design a speech sample predictor, which is another common element in modern codec designs, to highlight the benefits of long-term and non-linear speech signal processing. Then, we explore latent variables, a space of speech representations, to encode both short-term and long-term speech attributes. Lastly, we propose a decoder network to synthesize speech signals from these representations, which constitutes our final step towards building a complete, end-to-end machine-learning based speech compression method. The first two steps, classification and prediction, provide new tools that could replace and improve elements of existing codecs. In the first step, we use a combination of source-filter model and liquid state machine (LSM), to demonstrate that features related to emotions can be easily extracted and classified using a simple classifier. In the second step, a single end-to-end network using long short-term memory (LSTM) is shown to produce speech frames with high subjective quality for packet loss concealment (PLC) applications. In the last steps, we build upon the results of previous steps to design a fully machine learning-based codec. An encoder network, formulated using a deep neural network (DNN) and trained on multiple public databases, extracts and encodes speech representations using prediction in a latent space. An unsupervised learning approach based on several principles of cognition is proposed to extract representations from both short and long frames of data using mutual information and contrastive loss. The ability of these learned representations to capture various short- and long-term speech attributes is demonstrated. Finally, a decoder structure is proposed to synthesize speech signals from these representations. Adversarial training is used as an approximation to subjective speech quality measures in order to synthesize natural-sounding speech samples. The high perceptual quality of synthesized speech thus achieved proves that the extracted representations are efficient at preserving all sorts of speech attributes and therefore that a complete compression method is demonstrated with the proposed approach

    Měření Triple play služeb v hybridní síti

    Get PDF
    The master's thesis deals with a project regarding the implementation, design and the quality of IPTV, VoIP and Data services within the Triple Play services. In heterostructural networks made up of GEPON and xDSL technologies. Different lengths of the optical and metallic paths were used for the measurements. The first part of the thesis is theoretically analyzed the development and trend of optical and metallic networks. The second part deals with the measurement of typical optical and metallic parameters on the constructed experimental network, where its integrity was tested. Another part of the thesis is the evaluation of Triple play results, regarding the test where the network was variously tasked/burdened with data traffic and evaluated according to defined standards. The last part is concerned with the Optiwave Software simulation environment.Diplomová práce se zabývá návrhem, realizací a kvalitou služeb IPTV, VoIP a Data v rámci Triple play služeb v heterostrukturní sítí tvořené GEPON a xDSL technologiemi. Pro měření byli využity různé délky optické a metalické trasy. První části diplomové práce je teoreticky rozebrán vývoj a trend optických a metalických sítí. Druhá část se zaměřuje na měření typických optických a metalických parametrů na vybudované experimentální síti, kde byla následně testována její integrita. Dalším bodem práce je vyhodnocení výsledků Triple play, kde síť je různě zatěžována datovým provozem a následně vyhodnocována podle definovaných norem. Závěr práce je věnovaný simulačnímu prostředí Optiwave.440 - Katedra telekomunikační technikyvýborn

    E-model implementation for VoIP QoS across a hybrid UMTS network

    Get PDF
    Voice over Internet Protocol (VoIP) provides a new telephony approach where the voice traffic passes over Internet Protocol shared traffic networks. VoIP is a significant application of the converged network principle. The research aim is to model VoIP over a hybrid Universal Mobile Telecommunications System (UMTS) network and to identify an improved approach to applying the ITU-T Recommendation G.107 (E-Model) to understand possible Quality of Service (QoS) outcomes for the hybrid UMTS network. This research included Modeling the hybrid UMTS network and carrying out simulations of different traffic types transmitted over the network. The traffic characteristics were analysed and compared with results from the literature. VoIP traffic was modelled over the hybrid UMTS network and the VoIP traffic was generated to represent different loads on the network from light to medium and heavy VoIP traffic. The VoIP over hybrid UMTS network traffic results were characterized and used in conjunction with the E-Model to identify VoIP QoS outcomes. The E-Model technique was implemented and results achieved were compared with results for other network types highlighted in the literature. The research identified an approach that permits accurate Modeling of VoIP QoS over a hybrid UMTS network. Accurate results should allow network design to facilitate new approaches to achieving an optimal network implementation for VoIP

    Quality aspects of Internet telephony

    Get PDF
    Internet telephony has had a tremendous impact on how people communicate. Many now maintain contact using some form of Internet telephony. Therefore the motivation for this work has been to address the quality aspects of real-world Internet telephony for both fixed and wireless telecommunication. The focus has been on the quality aspects of voice communication, since poor quality leads often to user dissatisfaction. The scope of the work has been broad in order to address the main factors within IP-based voice communication. The first four chapters of this dissertation constitute the background material. The first chapter outlines where Internet telephony is deployed today. It also motivates the topics and techniques used in this research. The second chapter provides the background on Internet telephony including signalling, speech coding and voice Internetworking. The third chapter focuses solely on quality measures for packetised voice systems and finally the fourth chapter is devoted to the history of voice research. The appendix of this dissertation constitutes the research contributions. It includes an examination of the access network, focusing on how calls are multiplexed in wired and wireless systems. Subsequently in the wireless case, we consider how to handover calls from 802.11 networks to the cellular infrastructure. We then consider the Internet backbone where most of our work is devoted to measurements specifically for Internet telephony. The applications of these measurements have been estimating telephony arrival processes, measuring call quality, and quantifying the trend in Internet telephony quality over several years. We also consider the end systems, since they are responsible for reconstructing a voice stream given loss and delay constraints. Finally we estimate voice quality using the ITU proposal PESQ and the packet loss process. The main contribution of this work is a systematic examination of Internet telephony. We describe several methods to enable adaptable solutions for maintaining consistent voice quality. We have also found that relatively small technical changes can lead to substantial user quality improvements. A second contribution of this work is a suite of software tools designed to ascertain voice quality in IP networks. Some of these tools are in use within commercial systems today
    corecore