50 research outputs found

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the 1\ell_1-norm of the source signal with the relaxed 2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the 2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    Efficient and compact representations of head related transfer functions

    Get PDF
    These days most reproduced sound is consumed using portable devices and headphones, on which spatial binaural audio can be conveniently presented. One way of converting from conventional loudspeaker formats to binaural format is through the use of Head Related Transfer Functions (HRTFs), but head-tracking is also necessary to obtain a satisfactory externalisation of the simulated sound field. Typically a large HRTF dataset is required in order to provide enough measurements for a continuous virtual auditory space to be achieved through simple linear interpolation, or similar.\\This work describes an investigation into the use of alternative compact and efficient representations of an HRTF dataset measured in the azimuthal plane. The two main prongs of investigation are the use of orthogonal transformations in a decompositional approach, and parametric modelling approach that utilises techniques often associated with speech processing. The latter approach is explored through the application of a linear prediction derived all-pole model method and a pole-zero model design method proposed by Steiglitz and McBride \citep{Steiglitz1965}. The all-pole model is deemed to offer superior performance in matching the measured data after compression of the HRTF set through computer simulation results, whilst a preliminary subjective validation of the pole-zero models, that contrary to theoretical driven expectations, performed considerably worse in computer simulation experiments, is conducted as a pilot study.\\Consideration is also given to a method of secondary compression and interpolation that utilises the Discrete Cosine Transform applied to the angular dependent components derived from each of the approaches. It is possible that these techniques may also be useful in developing efficient schemes of custom HRTF capture

    Evaluating the Perceived Quality of Binaural Technology

    Get PDF
    This thesis studies binaural sound reproduction from both a technical and a perceptual perspective, with the aim of improving the headphone listening experience for entertainment media audiences. A detailed review is presented of the relevant binaural technology and of the concepts and methods for evaluating perceived quality. A pilot study assesses the application of state-of-the-art binaural rendering systems to existing broadcast programmes, finding no substantial improvements in quality over conventional stereo signals. A second study gives evidence that realistic binaural simulation can be achieved without personalised acoustic calibration, showing promise for the application of binaural technology. Flexible technical apparatus is presented to allow further investigation of rendering techniques and content production processes. Two web-based studies show that appropriate combination of techniques can lead to improved experience for typical audience members, compared to stereo signals, even without personalised rendering or listener head-tracking. Recent developments in spatial audio applications are then discussed. These have made dynamic client-side binaural rendering with listener head-tracking feasible for mass audiences, but also present technical constraints. To limit distribution bandwidth and computational complexity during rendering, loudspeaker virtualisation is widely used. The effects on perceived quality of these techniques are studied in depth for the first time. A descriptive analysis experiment demonstrates that loudspeaker virtualisation during binaural rendering causes degradations to a range of perceptual characteristics and that these vary across other system conditions. A final experiment makes novel use of the check-all-that-apply method to efficiently characterise the quality of seven spatial audio representations and associated dynamic binaural rendering techniques, using single sound sources and complex dramatic scenes. The perceived quality of these different representations varies significantly across a wide range of characteristics and with programme material. These methods and findings can be used to improve the quality of current binaural technology applications

    Large Deformation Diffeomorphic Metric Mapping Provides New Insights into the Link Between Human Ear Morphology and the Head-Related Transfer Functions

    Get PDF
    The research findings presented in this thesis is composed of four sections. In the first section of this thesis, it is shown how LDDMM can be applied to deforming head and ear shapes in the context of morphoacoustic study. Further, tools are developed to measure differences in 3D shapes using the framework of currents and also to compare and measure the differences between the acoustic responses obtained from BEM simulations for two ear shapes. Finally this section introduces the multi-scale approach for mapping ear shapes using LDDMM. The second section of the thesis estimates a template ear, head and torso shape from the shapes available in the SYMARE database. This part of the thesis explains a new procedure for developing the template ear shape. The template ear and head shapes were are verified by comparing the features in the template shapes to corresponding features in the CIPIC and SYMARE database population. The third section of the thesis examines the quality of the deformations from the template ear shape to target ears in SYMARE from both an acoustic and morphological standpoint. As a result of this investigation, it was identified that ear shapes can be studied more accurately by the use of two physical scales and that scales at which the ear shapes were studied were dependent on the parameters chosen when mapping ears in the LDDMM framework. Finally, this section concludes by noting how shape distances vary with the acoustic distances using the developed tools. In the final part of this thesis, the variations in the morphology of ears are examined using the Kernel Principle Component Analysis (KPCA) and the changes in the corresponding acoustics are studied using the standard principle component analysis (PCA). These examinations involved identifying the number of kernel principle components that are required in order to model ear shapes with an acceptable level of accuracy, both morphologically and acoustically

    BIOLOGICALLY-INFORMED COMPUTATIONAL MODELS OF HARMONIC SOUND DETECTION AND IDENTIFICATION

    Get PDF
    Harmonic sounds or harmonic components of sounds are often fused into a single percept by the auditory system. Although the exact neural mechanisms for harmonic sensitivity remain unclear, it arises presumably in the auditory cortex because subcortical neurons typically prefer only a single frequency. Pitch sensitive units and harmonic template units found in awake marmoset auditory cortex are sensitive to temporal and spectral periodicity, respectively. This thesis is a study of possible computational mechanisms underlying cortical harmonic selectivity. To examine whether harmonic selectivity is related to statistical regularities of natural sounds, simulated auditory nerve responses to natural sounds were used in principal component analysis in comparison with independent component analysis, which yielded harmonic-sensitive model units with similar population distribution as real cortical neurons in terms of harmonic selectivity metrics. This result suggests that the variability of cortical harmonic selectivity may provide an efficient population representation of natural sounds. Several network models of spectral selectivity mechanisms are investigated. As a side study, adding synaptic depletion to an integrate-and-fire model could explain the observed modulation-sensitive units, which are related to pitch-sensitive units but cannot account for precise temporal regularity. When a feed-forward network is trained to detect harmonics, the result is always a sieve, which is excited by integer multiples of the fundamental frequency and inhibited by half-integer multiples. The sieve persists over a wide variety of conditions including changing evaluation criteria, incorporating Dale’s principle, and adding a hidden layer. A recurrent network trained by Hebbian learning produces harmonic-selective by a novel dynamical mechanism that could be explained by a Lyapunov function which favors inputs that match the learned frequency correlations. These model neurons have sieve-like weights like the harmonic template units when probed by random harmonic stimuli, despite there being no sieve pattern anywhere in the network’s weights. Online stimulus design has the potential to facilitate future experiments on nonlinear sensory neurons. We accelerated the sound-from-texture algorithm to enable online adaptive experimental design to maximize the activities of sparsely responding cortical units. We calculated the optimal stimuli for harmonic-selective units and investigated model-based information-theoretic method for stimulus optimization

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f

    Music Production Behaviour Modelling

    Get PDF
    The new millennium has seen an explosion of computational approaches to the study of music production, due in part to the decreasing cost of computation and the increase of digital music production techniques. The rise of digital recording equipment, MIDI, digital audio workstations (DAWs), and software plugins for audio effects led to the digital capture of various processes in music production. This discretization of traditionally analogue methods allowed for the development of intelligent music production, which uses machine learning to numerically characterize and automate portions of the music production process. One algorithm from the field referred to as ``reverse engineering a multitrack mix'' can recover the audio effects processing used to transform a multitrack recording into a mixdown in the absence of information about how the mixdown was achieved. This thesis improves on this method of reverse engineering a mix by leveraging recent advancements in machine learning for audio. Using the differentiable digital signal processing paradigm, greybox modules for gain, panning, equalisation, artificial reverberation, memoryless waveshaping distortion, and dynamic range compression are presented. These modules are then connected in a mixing chain and are optimized to learn the effects used in a given mixdown. Both objective and perceptual metrics are presented to measure the performance of these various modules in isolation and within a full mixing chain. Ultimately a fully differentiable mixing chain is presented that outperforms previously proposed methods to reverse engineer a mix. Directions for future work are proposed to improve characterization of multitrack mixing behaviours
    corecore