10 research outputs found

    A directionally tunable but frequency-invariant beamformer on an acoustic velocity-sensor triad to enhance speech perception

    Get PDF
    Herein investigated are computationally simple microphone-array beamformers that are independent of the frequency-spectra of all signals, all interference, and all noises. These beamformers allow the listener to tune the desired azimuth-elevation “look direction.” No prior information is needed of the interference. These beamformers deploy a physically compact triad of three collocated but orthogonally oriented velocity sensors. These proposed schemes’ efficacy is verified by a jury test, using simulated data constructed with Mandarin Chinese (a.k.a. Putonghua) speech samples. For example, a desired speech signal, originally at a very adverse signal-to-interference-and-noise power ratio (SINR) of -30 dB, may be processed to become fully intelligible to the jury

    Separation of speech sources using an acoustic vector sensor

    Full text link
    This paper investigates how the directional characteristics of an Acoustic Vector Sensor (AVS) can be used to separate speech sources. The technique described in this work takes advantage of the frequency domain direction of arrival estimates to identify the location, relative to the AVS array, of each individual speaker in a group of speakers and separate them accordingly into individual speech signals. Results presented in this work show that the technique can be used for real-time separation of speech sources using a single 20ms frame of speech, furthermore the results presented show that there is an average improvement in the Signal to Interference Ratio (SIR) for the proposed algorithm over the unprocessed recording of 15.1 dB and an average improvement of 5.4 dB in terms of Signal to Distortion Ratio (SDR) over the unprocessed recordings. In addition to the SIR and SDR results, Perceptual Evaluation of Speech Quality (PESQ) and listening tests both show an improvement in perceptual quality of 1 Mean Opinion Score (MOS) over unprocessed recordings. © 2011 IEEE

    Linear Predictive perceptual filtering for Acoustic Vector Sensors: Exploiting directional recordings for high quality speech enhancement

    No full text
    This paper investigates the performance of a new technique for speech enhancement which combines Linear Predictive (LP) spectrum-based perceptual filtering to the recordings obtained from an Acoustic Vector Sensor (AVS). The technique takes advantage of the directional polar responses of the AVS to obtain a significantly more accurate representation of the LP spectrum of a target speech signal in the presence of noise when compared to single channel, omni-directional recordings. Comparisons between the speech quality obtained from the proposed technique and existing beamforming-based speech enhancement techniques for the AVS are made through Perceptual Evaluation of Speech Quality (PESQ) tests and Mean Opinion Score (MOS) listening tests. Results show significant improvements in PESQ and MOS scores of 0.2 and 1.6, respectively, for the proposed enhancement technique. Being based on a miniature microphone array, the approach is particular suitable for hands free communication applications in mobile telephony

    Separation of speech sources using an Acoustic Vector Sensor

    No full text
    This paper investigates how the directional characteristics of an Acoustic Vector Sensor (AVS) can be used to separate speech sources. The technique described in this work takes advantage of the frequency domain direction of arrival estimates to identify the location, relative to the AVS array, of each individual speaker in a group of speakers and separate them accordingly into individual speech signals. Results presented in this work show that the technique can be used for real-time separation of speech sources using a single 20ms frame of speech, furthermore the results presented show that there is an average improvement in the Signal to Interference Ratio (SIR) for the proposed algorithm over the unprocessed recording of 15.1 dB and an average improvement of 5.4 dB in terms of Signal to Distortion Ratio (SDR) over the unprocessed recordings. In addition to the SIR and SDR results, Perceptual Evaluation of Speech Quality (PESQ) and listening tests both show an improvement in perceptual quality of 1 Mean Opinion Score (MOS) over unprocessed recordings

    Speech dereverberation based on Linear Prediction: An Acoustic Vector Sensor approach

    Full text link
    This paper introduces a dereverberation algorithm based on Linear Prediction (LP) applied to the outputs of an Acoustic Vector Sensor (AVS). The approach applies adaptive beamforming to take advantage of the directional outputs of the AVS array to obtain a more accurate LP spectrum than can be obtained with a single channel or Uniform Linear Array (ULA) with a comparable number of channels. This is then used within a modified version of the Spatiotemporal Averaging Method for Enhancement of Reverberant Speech (SMERSH) algorithm derived for the AVS to enhance the LP residual signal. In a highly reverberant environment, the approach demonstrates a significant improvement compared to a ULA as measured by both the Signal to Reverberant Ratio (SRR) and Speech to Reverberation Modulation Energy Ratio (SRMR) for sources ranging from at 1m to 5m from the array. © 2013 IEEE

    Speech dereverberation based on Linear Prediction: an Acoustic Vector Sensor approach

    No full text
    This paper introduces a dereverberation algorithm based on Linear Prediction (LP) applied to the outputs of an Acoustic Vector Sensor (AVS). The approach applies adaptive beamforming to take advantage of the directional outputs of the AVS array to obtain a more accurate LP spectrum than can be obtained with a single channel or Uniform Linear Array (ULA) with a comparable number of channels. This is then used within a modified version of the Spatiotemporal Averaging Method for Enhancement of Reverberant Speech (SMERSH) algorithm derived for the AVS to enhance the LP residual signal. In a highly reverberant environment, the approach demonstrates a significant improvement compared to a ULA as measured by both the Signal to Reverberant Ratio (SRR) and Speech to Reverberation Modulation Energy Ratio (SRMR) for sources ranging from at 1m to 5m from the array

    Three-dimensional printable ultrasound transducer stabilization system

    Get PDF
    When using ultrasound imaging of the tongue for speech recording/research, submental transducer stabilization is required to prevent the ultrasound transducer from translating or rotating in relation to the tongue. An iterative prototype of a lightweight three-dimensional-printable wearable ultrasound transducer stabilization system that allows flexible jaw motion and free head movement is presented. The system is completely non-metallic, eliminating interference with co-recorded signals, thus permitting co-collection and co-registration with articulometry systems. A motion study of the final version demonstrates that transducer rotation is limited to 1.25 and translation to 2.5 mm— well within accepted tolerances

    Multisource DOA estimation based on time-frequency sparsity and joint inter-sensor data ratio with single acoustic vector sensor

    No full text
    By exploring the time-frequency (TF) sparsity property of the speech, the inter-sensor data ratios (ISDRs) of single acoustic vector sensor (AVS) have been derived and investigated. Under noiseless condition, ISDRs have favorable properties, such as being independent of frequency, DOA related with single valuedness, and no constraints on near or far field conditions. With these observations, we further investigated the behavior of ISDRs under noisy conditions and proposed a so-called ISDR-DOA estimation algorithm, where high local SNR data extraction and bivariate kernel density estimation techniques have been adopted to cluster the ISDRs representing the DOA information. Compared with the traditional DOA estimation methods with a small microphone array, the proposed algorithm has the merits of smaller size, no spatial aliasing and less computational cost. Simulation studies show that the proposed method with a single AVS can estimate up to seven sources simultaneously with high accuracy when the SNR is larger than 15dB. In addition, the DOA estimation results based on recorded data further validates the proposed algorithm
    corecore