33 research outputs found

    Perceptual models in speech quality assessment and coding

    Get PDF
    The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment. [Continues.

    Multirate digital filters, filter banks, polyphase networks, and applications: a tutorial

    Get PDF
    Multirate digital filters and filter banks find application in communications, speech processing, image compression, antenna systems, analog voice privacy systems, and in the digital audio industry. During the last several years there has been substantial progress in multirate system research. This includes design of decimation and interpolation filters, analysis/synthesis filter banks (also called quadrature mirror filters, or QMFJ, and the development of new sampling theorems. First, the basic concepts and building blocks in multirate digital signal processing (DSPJ, including the digital polyphase representation, are reviewed. Next, recent progress as reported by several authors in this area is discussed. Several applications are described, including the following: subband coding of waveforms, voice privacy systems, integral and fractional sampling rate conversion (such as in digital audio), digital crossover networks, and multirate coding of narrow-band filter coefficients. The M-band QMF bank is discussed in considerable detail, including an analysis of various errors and imperfections. Recent techniques for perfect signal reconstruction in such systems are reviewed. The connection between QMF banks and other related topics, such as block digital filtering and periodically time-varying systems, based on a pseudo-circulant matrix framework, is covered. Unconventional applications of the polyphase concept are discussed

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

    Perceptual techniques in audio quality assessment

    Get PDF

    Finding perceptually optimal operating points of a real time interactive video-conferencing system

    Get PDF
    This research aims to address issues faced by real time video-conferencing systems in locating a perceptually optimal operating point under various network and conversational conditions. In order to determine the perceptually optimal operating point of a video-conferencing system, we must first be able to conduct a fair assessment of the quality of the current operating point in the system and compare it with another operating point to determine if one is better than the other in terms of perceptual quality. However at this point in time, there does not exist one objective quality metric that can accurately and fully describe the perceptual quality of a real time video conversation. Hence there is a need for a controlled environment to allow tests to be conducted in and in which we can study different metrics and identify the best trade-offs between them. We begin by studying the components of a typical setup of a real time video-conferencing system and the impacts that various network and conversation conditions can have on the overall perceptual quality. We also look into different metrics available to measure those impacts. We then created a platform to perform black box testing on current video conferencing systems and observe how they handle the changes in operating conditions. The platform is then used to conduct a brief evaluation of the performance of Skype, a popular commercial video-conferencing system. However, we are not able to modify the system parameters of Skype. The main contribution of this thesis is the design of a new testbed that provides a controlled environment to allow tests to be conducted to determine the perceptual optimum operating point of a video conversation under specified network and conversation conditions. This testbed will allow us to modify certain parameters, such as frame rate and frame size, which were not previously possible. The testbed takes as input, two recorded videos of the two speakers of a face-to-face conversation and desired output video parameters, such as frame rate, frame size and delay. A video generation algorithm is designed as part of the testbed to handle modifications to frame rate and frame size of the videos as well as delays inserted into the recorded video conversation to simulate the effects of network delays. The most important issue addressed is the generation of new frames to fill up the gaps created due to a change in frame rate or delay inserted, unlike as in the case of voice, where a period of silence can simply be used to handle these situations. The testbed uses a packetization strategy designed on the basis of an uneven packet transmission rate (UPTR) and that handles the packetization of interleaved video and audio data; it also uses piggybacking to provide redundancy if required. Losses can be injected either randomly or based on packet traces collected via PlanetLab. The processed videos will then be pieced together side-by-side to give the viewpoint of a third-party observing the video conversation from the site of the first speaker. Hence the first speaker will be observed to have a faster reaction time without network delays than that of the second speaker who is simulated to be located at the remote end. The video of the second speaker will also reflect the degradations in perceptual quality induced by the network conditions, whereas the first speaker will be of perfect quality. Hence with the testbed, we are able to generate output videos for different operating points under the same network and conversational conditions and thus able to make comparisons between two operating points. With the testbed in place, we demonstrate how it can be used to evaluate the effects of various parameters on the overall perceptual quality. Lastly, we demonstrate the results of applying an existing efficient search algorithm used for estimating the perceptually optimal mouth-to-ear delay (MED) of a Voice-over-IP(VoIP) conversation to a Video Conversation. This is achieved by using the network simulator designed to conduct a series of subjective and objective tests to identify the perceptual optimum MED under specific network and conversational conditions

    Adaptation de contexte basée sur la qualité d'expérience dans les réseaux internet du futur

    Get PDF
    Pour avoir une idée sur la qualité du réseau, la majorité des acteurs concernés (opérateurs réseau, fournisseurs de service) se basent sur la Qualité de Service (Quality of Service). Cette mesure a montré des limites et beaucoup d efforts ont été déployés pour mettre en place une nouvelle métrique qui reflète, de façon plus précise, la qualité du service offert. Cette mesure s appelle la qualité d expérience (Quality of Experience). La qualité d expérience reflète la satisfaction de l utilisateur par rapport au service qu il utilise. L évaluation de la qualité d expérience est devenue primordiale pour les fournisseurs de services et les fournisseurs de contenus. Cette nécessité nous a poussés à innover et mettre en place des nouvelles méthodes pour estimer la QoE. Dans cette thèse, nous travaillons sur l estimation de la QoE dans le cas des communications Voix sur IP et dans le cas de la vidéo sur IP. Nous étudions les performances et la qualité des codecs iLBC, Speex et Silk pour la VoIP et les codecs MPEG-2 et H.264/SVC pour la vidéo sur IP. Nous étudions l impact que peut avoir la majorité des paramètres réseaux, des paramètres sources (au niveau du codage) et destinations (au niveau du décodage) sur la qualité finale. Afin de mettre en place des outils précis d estimation de la QoE en temps réel, nous nous basons sur la méthodologie Pseudo-Subjective Quality Assessment. La méthodologie PSQA est basée sur un modèle mathématique appelé les réseaux de neurones artificiels. En plus des réseaux de neurones, nous utilisons la régression polynomiale pour l estimation de la QoE dans le cas de la VoIP.Quality of Experience (QoE) is the key criteria for evaluating the Media Services. Unlike objective Quality of Service (QoS) metrics, QoE is more accurate to reflect the user experience. The Future of Internet is definitely going to be Media oriented. Towards this, there is a profound need for an efficient measure of the Quality of Experience (QoE). QoE will become the prominent metric to consider when deploying Networked Media services. In this thesis, we provide several methods to estimate the QoE of different media services: Voice and Video over IP. We study the performance and the quality of several VoIP codecs like iLBC, Speex and Silk. Based on this study, we proposed two methods to estimate the QoE in real-time context, without any need of information of the original voice sequence. The first method is based on polynomial regression, and the second one is based on an hybrid methodology (objective and subjective) called Pseudo-Subjective Quality Assessment. PSQA is based on the artificial neural network mathematical model. As for the VoIP, we propose also a tool to estimate video quality encoded with MPEG-2 and with H.264/SVC. We studied also the impact of several network parameters on the quality, and the impact of some encoding parameters on the SVC video quality. We tested also the performance of several SVC encoders and proposed some SVC encoding recommendations.RENNES1-Bibl. électronique (352382106) / SudocSudocFranceF

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Analysis of speech and other sounds

    Get PDF
    This thesis comprises a study of various types of signal processing techniques, applied to the tasks of extracting information from speech, cough, and dolphin sounds. Established approaches to analysing speech sounds for the purposes of low data rate speech encoding, and more generally to determine the characteristics of the speech signal, are reviewed. Two new speech processing techniques, shift-and-add and CLEAN (which have previously been applied in the field of astronomical image processing), are developed and described in detail. Shift-and-add is shown to produce a representation of the long-term "average" characteristics of the speech signal. Under certain simplifying assumptions, this can be equated to the average glottal excitation. The iterative deconvolution technique called CLEAN is employed to deconvolve the shift-and-add signal from the speech signal. Because the resulting "CLEAN" signal has relatively few non-zero samples, it can be directly encoded at a low data rate. The performance of a low data rate speech encoding scheme that takes advantage of this attribute of CLEAN is examined in detail. Comparison with the multi-pulse LP C approach to speech coding shows that the new method provides similar levels of performance at medium data rates of about 16kbit/s. The changes that occur in the character of a person's cough sounds when that person is afflicted with asthma are outlined. The development and implementation of a micro-computer-based cough sound analysis system, designed to facilitate the ongoing study of these sounds, is described. The system performs spectrographic analysis on the cough sounds. A graphical user interface allows the sound waveforms and spectra to be displayed and examined in detail. Preliminary results are presented, which indicate that the spectral content of cough sounds are changed by asthma. An automated digital approach to studying the characteristics of Hector's dolphin vocalisations is described. This scheme characterises the sounds by extracting descriptive parameters from their time and frequency domain envelopes. The set of parameters so obtained from a sample of click sequences collected from free-ranging dolphins is analysed by principal component analysis. Results are presented which indicate that Hector's dolphins produce only a small number of different vocal sounds. In addition to the statistical analysis, several of the clicks, which are assumed to be used for echo-location, are analysed in terms of their range-velocity ambiguity functions. The results suggest that Hector's dolphins can distinguish targets separated in range by about 2cm, but are unable to separate targets that differ only in their velocity
    corecore