1,294 research outputs found

    A Bandpass Transform for Speaker Normalization

    Get PDF
    One of the major challenges for Automatic Speech Recognition is to handle speech variability. Inter-speaker variability is partly due to differences in speakers' anatomy and especially in their Vocal Tract geometry. Dissimilarities in Vocal Tract Length (VTL) are a known source of speech variation. Vocal Tract Length Normalization is a popular Speaker Normalization technique that can be implemented as a transformation of a spectrum frequency axis. We introduce in this document a new spectral transformation for Speaker Normalization. We use the Bilinear Transformation to introduce a new frequency warping resulting from a mapping of a prototype Band-Pass (BP) filter into a general BP filter. This new transformation called the Bandpass Transformation (BPT) offers two degrees of freedom enabling complex warpings of the frequency axis that are different from previous works with the Bilinear Transform. We then define a procedure to use BPT for Speaker Normalization based on the Nelder-Mead algorithm for the estimation of the BPT parameters. We present a detailed study of the performance of our new approach on two test sets with gender dependent and independent systems. Our results demonstrate clear improvements compared to standard methods used in VTL Normalization. A score compensation procedure is presented and results in further improvements of our results by refining our BPT parameter estimation

    Wavelet-based techniques for speech recognition

    Get PDF
    In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. [Continues.

    Analysis and resynthesis of polyphonic music

    Get PDF
    This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments

    Étude de transformées temps-fréquence pour le codage audio faible retard en haute qualité

    Get PDF
    In recent years there has been a phenomenal increase in the number of products and applications which make use of audio coding formats. Amongthe most successful audio coding schemes, the MPEG-1 Layer III (mp3), the MPEG-2 Advanced Audio Coding (AAC) or its evolution MPEG-4High Efficiency-Advanced Audio Coding (HE-AAC) can be cited. More recently, perceptual audio coding has been adapted to achieve codingat low-delay such to become suitable for conversational applications. Traditionally, the use of filter bank such as the Modified Discrete CosineTransform (MDCT) is a central component of perceptual audio coding and its adaptation to low delay audio coding has become an important researchtopic. Low delay transforms have been developed in order to retain the performance of standard audio coding while reducing dramatically the associated algorithmic delay.This work presents some elements allowing to better accommodate the delay reduction constraint. Among the contributions, a low delay blockswitching tool which allows the direct transition between long transform and short transform without the insertion of transition window. The sameprinciple has been extended to define new perfect reconstruction conditions for the MDCT with relaxed constraints compared to the original definition.As a consequence, a seamless reconstruction method has been derived to increase the flexibility of transform coding schemes with the possibility toselect a transform for a frame independently from its neighbouring frames. Finally, based on this new approach, a new low delay window design procedure has been derived to obtain an analytic definition for a new family of transforms, permitting high quality with a substantial coding delay reduction. The performance of the proposed transforms has been thoroughly evaluated, an evaluation framework involving an objective measurement of the optimal transform sequence is proposed. It confirms the relevance of the proposed transforms used for audio coding. In addition, the new approaches have been successfully applied to the recent standardisation work items, such as the low delay audio coding developed at MPEG (LD-AAC and ELD-AAC) and they have been evaluated with numerous subjective testing, showing a significant improvement of the quality for transient signals. The new low delay window design has been adopted in G.718, a scalable speech and audio codec standardized in ITU-T and has demonstrated its benefit in terms of delay reduction while maintaining the audio quality of a traditional MDCT.Codage audio à faible retard à l'aide de la définition de nouvelles fenêtres pour la transformée MDCT et l'introduction d'un nouveau schéma de commutation de fenêtre

    A field compensated Michelson spectrometer for the visible region

    Get PDF
    Imperial Users onl

    Investigation into digital audio equaliser systems and the effects of arithmetic and transform errors on performance

    Get PDF
    Merged with duplicate record 10026.1/2685 on 07.20.2017 by CS (TIS)Discrete-time audio equalisers introduce a variety of undesirable artefacts into audio mixing systems, namely, distortions caused by finite wordlength constraints, frequency response distortion due to coefficient calculation and signal disturbances that arise from real-time coefficient update. An understanding of these artefacts is important in the design of computationally affordable, good quality equalisers. A detailed investigation into these artefacts using various forms of arithmetic, filter frequency response, input excitation and sampling frequencies is described in this thesis. Novel coefficient calculation techniques, based on the matched z-transform (MZT) were developed to minimise filter response distortion and computation for on-line implementation. It was found that MZT-based filter responses can approximate more closely to s-plane filters, than BZTbased filters, with an affordable increase in computation load. Frequency response distortions and prewarping/correction schemes at higher sampling frequencies (96 and 192 kHz) were also assessed. An environment for emulating fractional quantisation in fixed and floating point arithmetic was developed. Various key filter topologies were emulated in fixed and floating point arithmetic using various input stimuli and frequency responses. The work provides detailed objective information and an understanding of the behaviour of key topologies in fixed and floating point arithmetic and the effects of input excitation and sampling frequency. Signal disturbance behaviour in key filter topologies during coefficient update was investigated through the implementation of various coefficient update scenarios. Input stimuli and specific frequency response changes that produce worst-case disturbances were identified, providing an analytical understanding of disturbance behaviour in various topologies. Existing parameter and coefficient interpolation algorithms were implemented and assessed under fihite wordlength arithmetic. The disturbance behaviour of various topologies at higher sampling frequencies was examined. The work contributes to the understanding of artefacts in audio equaliser implementation. The study of artefacts at the sampling frequencies of 48,96 and 192 kHz has implications in the assessment of equaliser performance at higher sampling frequencies.Allen & Heath Limite

    When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition

    Full text link
    Automatic speech recognition (ASR) systems have been widely deployed in modern smart devices to provide convenient and diverse voice-controlled services. Since ASR systems are vulnerable to audio replay attacks that can spoof and mislead ASR systems, a number of defense systems have been proposed to identify replayed audio signals based on the speakers' unique acoustic features in the frequency domain. In this paper, we uncover a new type of replay attack called modulated replay attack, which can bypass the existing frequency domain based defense systems. The basic idea is to compensate for the frequency distortion of a given electronic speaker using an inverse filter that is customized to the speaker's transform characteristics. Our experiments on real smart devices confirm the modulated replay attacks can successfully escape the existing detection mechanisms that rely on identifying suspicious features in the frequency domain. To defeat modulated replay attacks, we design and implement a countermeasure named DualGuard. We discover and formally prove that no matter how the replay audio signals could be modulated, the replay attacks will either leave ringing artifacts in the time domain or cause spectrum distortion in the frequency domain. Therefore, by jointly checking suspicious features in both frequency and time domains, DualGuard can successfully detect various replay attacks including the modulated replay attacks. We implement a prototype of DualGuard on a popular voice interactive platform, ReSpeaker Core v2. The experimental results show DualGuard can achieve 98% accuracy on detecting modulated replay attacks.Comment: 17 pages, 24 figures, In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS' 20

    Low Power Adaptive Circuits: An Adaptive Log Domain Filter and A Low Power Temperature Insensitive Oscillator Applied in Smart Dust Radio

    Get PDF
    This dissertation focuses on exploring two low power adaptive circuits. One is an adaptive filter at audio frequency for system identification. The other is a temperature insensitive oscillator for low power radio frequency communication. The adaptive filter is presented with integrated learning rules for model reference estimation. The system is a first order low pass filter with two parameters: gain and cut-off frequency. It is implemented using multiple input floating gate transistors to realize online learning of system parameters. Adaptive dynamical system theory is used to derive robust control laws in a system identification task. Simulation results show that convergence is slower using simplified control laws but still occurs within milliseconds. Experimental results confirm that the estimated gain and cut-off frequency track the corresponding parameters of the reference filter. During operation, deterministic errors are introduced by mismatch within the analog circuit implementation. An analysis is presented which attributes the errors to current mirror mismatch. The harmonic distortion of the filter operating in different inversion is analyzed using EKV model numerically. The temperature insensitive oscillator is designed for a low power wireless network. The system is based on a current starved ring oscillator implemented using CMOS transistors instead of LC tank for less chip area and power consumption. The frequency variance with temperature is compensated by the temperature adaptive circuits. Experimental results show that the frequency stability from 5°C to 65°C has been improved 10 times with automatic compensation and at least 1 order less power is consumed than published competitors. This oscillator is applied in a 2.2GHz OOK transmitter and a 2.2GHz phase locked loop based FM receiver. With the increasing needs of compact antenna, possible high data rate and wide unused frequency range of short distance communication, a higher frequency phase locked loop used for BFSK receiver is explored using an LC oscillator for its capability at 20GHz. The success of frequency demodulation is demonstrated in the simulation results that the PLL can lock in 0.5μs with 35MHz lock-in range and 2MHz detection resolution. The model of a phase locked loop used for BFSK receiver is analyzed using Matlab

    Development of the Carbon Nanotube Thermoacoustic Loudspeaker

    Get PDF
    Traditional speakers make sound by attaching a coil to a cone and moving that coil back and forth in a magnetic field (aka moving coil loudspeakers). The physics behind how to generate sound via this velocity boundary condition has largely been unchanged for over a hundred years. Interestingly, around the time moving coil loudspeakers were first investigated the idea of using heat to generate sound was also known. These thermoacoustic speakers heat and cool a thin material at acoustic frequencies to generate the pressure wave (i.e. they use a thermal boundary condition). Unfortunately, when the thermoacoustic principle was initially discovered there was no material with the right properties to heat and cool fast enough. Carbon nanotube (CNT) loudspeakers first generated sound early in the 21st century. At that time there were many questions unanswered about their place in the sound generation toolbox of an engineer. The main goal of this dissertation was to continue the development of the CNT loudspeaker with focus on practical usage for an acoustic engineer. Prior to 2014, when this effort began, most of the published development work was from material scientists with objective acoustic performance data presented that was not useful beyond the scope of that particular publication. For example, low sound pressure levels in the nearfield at low power inputs was a common metric. Therefore, this effort had three main objectives with emphasis placed on acquiring data at levels and in nomenclature that would be useful to acoustic engineers so they could bring the technology to market, if adequate. Investigation into the true power efficiency of CNT loudspeakers Investigation into alternative methods to linearize the pressure response of CNT loudspeakers Investigation into the sound quality of CNT loudspeakers Overall, it was found that CNT loudspeakers are approximately four orders of magnitude less power efficient than traditional moving coil loudspeakers. The non-linear pressure output of the CNT loudspeakers can be linearized with a variety of drive signal processing methods, but the selection of which method to use depends on a variety of factors (e.g. amplification architecture available). In general, all methods studied are on the same order of magnitude power efficiency, but the direct current offset and amplitude modulation drive signal processing methods are superior in terms of sound quality
    corecore