125 research outputs found
Systems And Methods For Detecting Call Provenance From Call Audio
Various embodiments of the invention are detection systems and methods for detecting call provenance based on call audio. An exemplary embodiment of the detection system can comprise a characterization unit, a labeling unit, and an identification unit. The characterization unit can extract various characteristics of networks through which a call traversed, based on call audio. The labeling unit can be trained on prior call data and can identify one or more codecs used to encode the call, based on the call audio. The identification unit can utilize the characteristics of traversed networks and the identified codecs, and based on this information, the identification unit can provide a provenance fingerprint for the call. Based on the call provenance fingerprint, the detection system can identify, verify, or provide forensic information about a call audio source.Georgia Tech Research Corporatio
Speech quality prediction for voice over Internet protocol networks
Merged with duplicate record 10026.1/878 on 03.01.2017 by CS (TIS). Merged with duplicate record 10026.1/1657 on 15.03.2017 by CS (TIS)This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.IP networks are on a steep slope of innovation that will make them the long-term carrier
of all types of traffic, including voice. However, such networks are not designed to support
real-time voice communication because their variable characteristics (e.g. due to delay, delay
variation and packet loss) lead to a deterioration in voice quality. A major challenge in such networks
is how to measure or predict voice quality accurately and efficiently for QoS monitoring
and/or control purposes to ensure that technical and commercial requirements are met.
Voice quality can be measured using either subjective or objective methods. Subjective
measurement (e.g. MOS) is the benchmark for objective methods, but it is slow, time consuming
and expensive. Objective measurement can be intrusive or non-intrusive. Intrusive methods
(e.g. ITU PESQ) are more accurate, but normally are unsuitable for monitoring live traffic
because of the need for a reference data and to utilise the network. This makes non-intrusive
methods(e.g. ITU E-model) more attractive for monitoring voice quality from IP network impairments.
However, current non-intrusive methods rely on subjective tests to derive model
parameters and as a result are limited and do not meet new and emerging applications.
The main goal of the project is to develop novel and efficient models for non-intrusive
speech quality prediction to overcome the disadvantages of current subjective-based methods
and to demonstrate their usefulness in new and emerging VoIP applications. The main contributions
of the thesis are fourfold:
(1) a detailed understanding of the relationships between voice quality, IP network impairments
(e.g. packet loss, jitter and delay) and relevant parameters associated with speech (e.g.
codec type, gender and language) is provided. An understanding of the perceptual effects of
these key parameters on voice quality is important as it provides a basis for the development
of non-intrusive voice quality prediction models. A fundamental investigation of the impact of
the parameters on perceived voice quality was carried out using the latest ITU algorithm for
perceptual evaluation of speech quality, PESQ, and by exploiting the ITU E-model to obtain an
objective measure of voice quality.
(2) a new methodology to predict voice quality non-intrusively was developed. The method
exploits the intrusive algorithm, PESQ, and a combined PESQ/E-model structure to provide a
perceptually accurate prediction of both listening and conversational voice quality non-intrusively.
This avoids time-consuming subjective tests and so removes one of the major obstacles in the
development of models for voice quality prediction. The method is generic and as such has
wide applicability in multimedia applications. Efficient regression-based models and robust
artificial neural network-based learning models were developed for predicting voice quality
non-intrusively for VoIP applications.
(3) three applications of the new models were investigated: voice quality monitoring/prediction
for real Internet VoIP traces, perceived quality driven playout buffer optimization and
perceived quality driven QoS control. The neural network and regression models were both
used to predict voice quality for real Internet VoIP traces based on international links. A new
adaptive playout buffer and a perceptual optimization playout buffer algorithms are presented.
A QoS control scheme that combines the strengths of rate-adaptive and priority marking control
schemes to provide a superior QoS control in terms of measured perceived voice quality is
also provided.
(4) a new methodology for Internet-based subjective speech quality measurement which
allows rapid assessment of voice quality for VoIP applications is proposed and assessed using
both objective and traditional MOS test methods
Comparative Analysis of Voice Quality on iLBC and Speex Codecs Server PBX with Voice Comparison Method
VoIP servers provide several types of codecs including iLBC and Speex codecs, each type of codec has a different capacity and quality- different. So that in the process of choosing a codec implementation, VoIP becomes one of the things that affect the quality of communication. The iLBC and Speex codecs are intended for high quality but bitrate . By comparing iLBC and Speex communication, VoIP hoped that the better performance between the two codecs would communicate VoIP based on Quality of Service (QoS) and analyze the audio results of VoIP for voice quality analysis using the voice comparison method using Matlab. Based on the results of the research between the iLBC and Speex parameter value QoS including delay codec speex is smaller than iLBC codec of 31.12 ms. The jitter value of the iLBC and speex is the same, namely 0.01. The packet loss value of the iLBC codec is smaller than the speex codec of 6.18%. The results of the sound quality test in matlab using the sound comparison method showed that the delta amplitude codec speex value was smaller than iLBC codec of 0.00004551 volts
Non-intrusive identification of speech codecs in digital audio signals
Speech compression has become an integral component in all modern telecommunications networks. Numerous codecs have been developed and deployed for efficiently transmitting voice signals while maintaining high perceptual quality. Because of the diversity of speech codecs used by different carriers and networks, the ability to distinguish between different codecs lends itself to a wide variety of practical applications, including determining call provenance, enhancing network diagnostic metrics, and improving automated speaker recognition. However, few research efforts have attempted to provide a methodology for identifying amongst speech codecs in an audio signal. In this research, we demonstrate a novel approach for accurately determining the presence of several contemporary speech codecs in a non-intrusive manner. The methodology developed in this research demonstrates techniques for analyzing an audio signal such that the subtle noise components introduced by the codec processing are accentuated while most of the original speech content is eliminated. Using these techniques, an audio signal may be profiled to gather a set of values that effectively characterize the codec present in the signal. This procedure is first applied to a large data set of audio signals from known codecs to develop a set of trained profiles. Thereafter, signals from unknown codecs may be similarly profiled, and the profiles compared to each of the known training profiles in order to decide which codec is the best match with the unknown signal. Overall, the proposed strategy generates extremely favorable results, with codecs being identified correctly in nearly 95% of all test signals. In addition, the profiling process is shown to require a very short analysis length of less than 4 seconds of audio to achieve these results. Both the identification rate and the small analysis window represent dramatic improvements over previous efforts in speech codec identification
- …