960 research outputs found

    Matrix of Polynomials Model based Polynomial Dictionary Learning Method for Acoustic Impulse Response Modeling

    Get PDF
    We study the problem of dictionary learning for signals that can be represented as polynomials or polynomial matrices, such as convolutive signals with time delays or acoustic impulse responses. Recently, we developed a method for polynomial dictionary learning based on the fact that a polynomial matrix can be expressed as a polynomial with matrix coefficients, where the coefficient of the polynomial at each time lag is a scalar matrix. However, a polynomial matrix can be also equally represented as a matrix with polynomial elements. In this paper, we develop an alternative method for learning a polynomial dictionary and a sparse representation method for polynomial signal reconstruction based on this model. The proposed methods can be used directly to operate on the polynomial matrix without having to access its coefficients matrices. We demonstrate the performance of the proposed method for acoustic impulse response modeling.Comment: 5 pages, 2 figure

    Spatial features of reverberant speech: estimation and application to recognition and diarization

    Get PDF
    Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions.This thesis proposes a method to non-intrusively estimate room acoustic parameters, paying special attention to a room acoustic parameter highly correlated with speech recognition degradation: clarity index. In addition, a method to provide information regarding the estimation accuracy is proposed. An analysis of the phoneme recognition performance for multiple reverberant environments is presented, from which a confusability metric for each phoneme is derived. This confusability metric is then employed to improve reverberant speech recognition performance. Additionally, room acoustic parameters can as well be used in speech recognition to provide robustness against reverberation. A method to exploit clarity index estimates in order to perform reverberant speech recognition is introduced. Finally, room acoustic parameters can also be used to diarize reverberant speech. A room acoustic parameter is proposed to be used as an additional source of information for single-channel diarization purposes in reverberant environments. In multi-channel environments, the time delay of arrival is a feature commonly used to diarize the input speech, however the computation of this feature is affected by reverberation. A method is presented to model the time delay of arrival in a robust manner so that speaker diarization is more accurately performed.Open Acces

    An Iterative Receiver for OFDM With Sparsity-Based Parametric Channel Estimation

    Get PDF
    In this work we design a receiver that iteratively passes soft information between the channel estimation and data decoding stages. The receiver incorporates sparsity-based parametric channel estimation. State-of-the-art sparsity-based iterative receivers simplify the channel estimation problem by restricting the multipath delays to a grid. Our receiver does not impose such a restriction. As a result it does not suffer from the leakage effect, which destroys sparsity. Communication at near capacity rates in high SNR requires a large modulation order. Due to the close proximity of modulation symbols in such systems, the grid-based approximation is of insufficient accuracy. We show numerically that a state-of-the-art iterative receiver with grid-based sparse channel estimation exhibits a bit-error-rate floor in the high SNR regime. On the contrary, our receiver performs very close to the perfect channel state information bound for all SNR values. We also demonstrate both theoretically and numerically that parametric channel estimation works well in dense channels, i.e., when the number of multipath components is large and each individual component cannot be resolved.Comment: Major revision, accepted for IEEE Transactions on Signal Processin

    A Perceptual Comparison of “Black Box” Modeling Algorithms for Nonlinear Audio Systems

    Get PDF
    Nonlinear systems identification is a widespread topic of interest, particularly within the audio industry, as these techniques are employed to synthesize black box models of nonlinear audio effects. Given the myriad approaches to black box modeling, questions arise as to whether an “optimal” approach exists, or one that achieves valid subjective results as a model with minimal computational expense. This thesis uses ABX listening tests to compare black box models of three hardware audio effects using two popular nonlinear implementations, along with two proposed modified implementations. Models were constructed in the Hammerstein form using sine sweeps and a novel measurement technique for the filters and nonlinearities, respectively. Testing revolved around null hypotheses assuming no change in model identification regardless of the device modeled, implementation used, or program material of the model stimulus. Results provide clear evidence of an effect on all of these accounts, and support a full rejection of the null hypotheses. Outcomes demonstrate a preferable implementation out of the algorithms tested, and suggest the removal of certain implementations as valid approaches altogether

    Focus Your Attention (with Adaptive IIR Filters)

    Full text link
    We present a new layer in which dynamic (i.e.,input-dependent) Infinite Impulse Response (IIR) filters of order two are used to process the input sequence prior to applying conventional attention. The input is split into chunks, and the coefficients of these filters are determined based on previous chunks to maintain causality. Despite their relatively low order, the causal adaptive filters are shown to focus attention on the relevant sequence elements. The layer performs on-par with state of the art networks, with a fraction of the parameters and with time complexity that is sub-quadratic with input size. The obtained layer is favorable to layers such as Heyna, GPT2, and Mega, both with respect to the number of parameters and the obtained level of performance on multiple long-range sequence problems.Comment: 11 pages, 4 figure
    • …
    corecore