22,260 research outputs found

    Frame Theory for Signal Processing in Psychoacoustics

    Full text link
    This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field

    A Phase Vocoder based on Nonstationary Gabor Frames

    Full text link
    We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure

    A Binaural Neuromorphic Auditory Sensor for FPGA: A Spike Signal Processing Approach

    Get PDF
    This paper presents a new architecture, design flow, and field-programmable gate array (FPGA) implementation analysis of a neuromorphic binaural auditory sensor, designed completely in the spike domain. Unlike digital cochleae that decompose audio signals using classical digital signal processing techniques, the model presented in this paper processes information directly encoded as spikes using pulse frequency modulation and provides a set of frequency-decomposed audio information using an address-event representation interface. In this case, a systematic approach to design led to a generic process for building, tuning, and implementing audio frequency decomposers with different features, facilitating synthesis with custom features. This allows researchers to implement their own parameterized neuromorphic auditory systems in a low-cost FPGA in order to study the audio processing and learning activity that takes place in the brain. In this paper, we present a 64-channel binaural neuromorphic auditory system implemented in a Virtex-5 FPGA using a commercial development board. The system was excited with a diverse set of audio signals in order to analyze its response and characterize its features. The neuromorphic auditory system response times and frequencies are reported. The experimental results of the proposed system implementation with 64-channel stereo are: a frequency range between 9.6 Hz and 14.6 kHz (adjustable), a maximum output event rate of 2.19 Mevents/s, a power consumption of 29.7 mW, the slices requirements of 11 141, and a system clock frequency of 27 MHz.Ministerio de Economía y Competitividad TEC2012-37868-C04-02Junta de Andalucía P12-TIC-130

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the 1\ell_1-norm of the source signal with the relaxed 2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the 2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table
    corecore