43,688 research outputs found

    Self-Authentication of Audio Signals by Chirp Coding

    Get PDF
    This paper discusses a new approach to ‘watermarking’ digital signals using linear frequency modulated or ‘chirp’ coding. The principles underlying this approach are based on the use of a matched filter to provide a reconstruction of a chirped code that is uniquely robust in the case of signals with very low signal-to-noise ratios. Chirp coding for authenticating data is generic in the sense that it can be used for a range of data types and applications (the authentication of speech and audio signals, for example). The theoretical and computational aspects of the matched filter and the properties of a chirp are revisited to provide the essential background to the method. Signal code generating schemes are then addressed and details of the coding and decoding techniques considered. Finally, the paper briefly describes an example application which is available on-line for readers who are interested in using the approach for audio data authentication working with either WAV or MP3 files

    Recognition of Isolated Marathi words from Side Pose for multi-pose Audio Visual Speech Recognition

    Get PDF
    Abstract: This paper presents a new multi pose audio visual speech recognition system based on fusion of side pose visual features and acoustic signals. The proposed method improved robustness and circumvention of conventional multimodal speech recognition system. The work was implemented on ñ€˜vVISWAñ€ℱ (Visual Vocabulary of Independent Standard Words) dataset comprised of full frontal, 45degree and side pose visual streams.The feature sets originating from the visual feature for Side pose are extracted using 2D Stationary Wavelet Transform (2D-SWT) and acoustic features extracted using (Linear Predictive Coding) LPC were fused and classified using KNN algorithm resulted in 90 % accuracy. This work facilitates approach of automatic recognition of isolated words from side pose in Multipose audio visual speech recognition domainwhere partial visual features of face were exists.Keywords: Side pose face detection, stationary wavelet transform, linear predictive analysis, Feature level fusion, KNN classifier

    Optimisation techniques for low bit rate speech coding

    Get PDF
    This thesis extends the background theory of speech and major speech coding schemes used in existing networks to an implementation of GSM full-rate speech compression on a RISC DSP and a multirate application for speech coding. Speech coding is the field concerned with obtaining compact digital representations of speech signals for the purpose of efficient transmission. In this thesis, the background of speech compression, characteristics of speech signals and the DSP algorithms used have been examined. The current speech coding schemes and requirements have been studied. The Global System for Mobile communication (GSM) is a digital mobile radio system which is extensively used throughout Europe, and also in many other parts of the world. The algorithm is standardised by the European Telecommunications Standardisation histitute (ETSI). The full-rate and half-rate speech compression of GSM have been analysed. A real time implementation of the full-rate algorithm has been carried out on a RISC processor GEPARD by Austria Mikro Systeme International (AMS). The GEPARD code has been tested with all of the test sequences provided by ETSI and the results are bit-exact. The transcoding delay is lower than the ETSI requirement. A comparison of the half-rate and full-rate compression algorithms is discussed. Both algorithms offer near toll speech quality comparable or better than analogue cellular networks. The half-rate compression requires more computationally intensive operations and therefore a more powerful processor will be needed due to the complexity of the code. Hence the cost of the implementation of half-rate codec will be considerably higher than full-rate. A description of multirate signal processing and its application on speech (SBC) and speech/audio (MPEG) has been given. An investigation into the possibility of combining multirate filtering and GSM fill-rate speech algorithm. The results showed that multirate signal processing cannot be directly applied GSM full-rate speech compression since this method requires more processing power, causing longer coding delay but did not appreciably improve the bit rate. In order to achieve a lower bit rate, the GSM full-rate mathematical algorithm can be used instead of the standardised ETSI recommendation. Some changes including the number of quantisation bits has to be made before the application of multirate signal processing and a new standard will be required

    Étude de transformĂ©es temps-frĂ©quence pour le codage audio faible retard en haute qualitĂ©

    Get PDF
    In recent years there has been a phenomenal increase in the number of products and applications which make use of audio coding formats. Amongthe most successful audio coding schemes, the MPEG-1 Layer III (mp3), the MPEG-2 Advanced Audio Coding (AAC) or its evolution MPEG-4High Efficiency-Advanced Audio Coding (HE-AAC) can be cited. More recently, perceptual audio coding has been adapted to achieve codingat low-delay such to become suitable for conversational applications. Traditionally, the use of filter bank such as the Modified Discrete CosineTransform (MDCT) is a central component of perceptual audio coding and its adaptation to low delay audio coding has become an important researchtopic. Low delay transforms have been developed in order to retain the performance of standard audio coding while reducing dramatically the associated algorithmic delay.This work presents some elements allowing to better accommodate the delay reduction constraint. Among the contributions, a low delay blockswitching tool which allows the direct transition between long transform and short transform without the insertion of transition window. The sameprinciple has been extended to define new perfect reconstruction conditions for the MDCT with relaxed constraints compared to the original definition.As a consequence, a seamless reconstruction method has been derived to increase the flexibility of transform coding schemes with the possibility toselect a transform for a frame independently from its neighbouring frames. Finally, based on this new approach, a new low delay window design procedure has been derived to obtain an analytic definition for a new family of transforms, permitting high quality with a substantial coding delay reduction. The performance of the proposed transforms has been thoroughly evaluated, an evaluation framework involving an objective measurement of the optimal transform sequence is proposed. It confirms the relevance of the proposed transforms used for audio coding. In addition, the new approaches have been successfully applied to the recent standardisation work items, such as the low delay audio coding developed at MPEG (LD-AAC and ELD-AAC) and they have been evaluated with numerous subjective testing, showing a significant improvement of the quality for transient signals. The new low delay window design has been adopted in G.718, a scalable speech and audio codec standardized in ITU-T and has demonstrated its benefit in terms of delay reduction while maintaining the audio quality of a traditional MDCT.Codage audio Ă  faible retard Ă  l'aide de la dĂ©finition de nouvelles fenĂȘtres pour la transformĂ©e MDCT et l'introduction d'un nouveau schĂ©ma de commutation de fenĂȘtre

    A Voice is Worth a Thousand Words: The Implications of the Micro-Coding of Social Signals in Speech for Trust Research

    Get PDF
    While self-report measures are often highly reliable for field research on trust (Mayer and Davis, 1999), subjects often cannot complete surveys during real time interactions. In contrast, the social signals that are embedded in the non-linguistic elements of conversations can be captured in real time and extracted with the assistance of computer coding. This chapter seeks to understand how computer-coded social signals are related to interpersonal trust

    A Novel Combined System of Direction Estimation and Sound Zooming of Multiple Speakers

    Get PDF
    This article presents a new system for estimation the direction of multiple speakers and zooming the sound of one of them at a time. The proposed system is a combination of two levels; namely, sound source direction estimation, and acoustic zooming. The sound source direction estimation uses so-called the energetic analysis method for estimation the direction of multiple speakers, whereas the acoustic zooming is based on modifying the parameters of the directional audio coding (DirAC) in order to zoom the sound of a selected speaker among the others. Both listening tests and objective assessments are performed to evaluate this system using different time-frequency transforms
    • 

    corecore