36 research outputs found
An efficient approach to dynamically weighted multizone wideband reproduction of speech soundfields
This paper proposes and evaluates an efficient approach for practical reproduction of multizone soundfields for speech sources. The reproduction method, based on a previously proposed approach, utilises weighting parameters to control the soundfield reproduced in each zone whilst minimising the number of loudspeakers required. Proposed here is an interpolation scheme for predicting the weighting parameter values of the multizone soundfield model that otherwise requires significant computational effort. It is shown that initial computation time can be reduced by a factor of 1024 with only 85dB of error in the reproduced soundfield relative to reproduction without interpolated weighting parameters. The perceptual impact on the quality of the speech reproduced using the method is also shown to be negligible. By using pre-saved soundfields determined using the proposed approach, practical reproduction of dynamically weighted multizone soundfields of wideband speech could be achieved in real-time
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
Deep neural networks (DNNs) are very effective for multichannel speech
enhancement with fixed array geometries. However, it is not trivial to use DNNs
for ad-hoc arrays with unknown order and placement of microphones. We propose a
novel triple-path network for ad-hoc array processing in the time domain. The
key idea in the network design is to divide the overall processing into spatial
processing and temporal processing and use self-attention for spatial
processing. Using self-attention for spatial processing makes the network
invariant to the order and the number of microphones. The temporal processing
is done independently for all channels using a recently proposed dual-path
attentive recurrent network. The proposed network is a multiple-input
multiple-output architecture that can simultaneously enhance signals at all
microphones. Experimental results demonstrate the excellent performance of the
proposed approach. Further, we present analysis to demonstrate the
effectiveness of the proposed network in utilizing multichannel information
even from microphones at far locations.Comment: Accepted for publication in INTERSPEECH 202
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
Audio-visual speech enhancement aims to extract clean speech from a noisy
environment by leveraging not only the audio itself but also the target
speaker's lip movements. This approach has been shown to yield improvements
over audio-only speech enhancement, particularly for the removal of interfering
speech. Despite recent advances in speech synthesis, most audio-visual
approaches continue to use spectral mapping/masking to reproduce the clean
audio, often resulting in visual backbones added to existing speech enhancement
architectures. In this work, we propose LA-VocE, a new two-stage approach that
predicts mel-spectrograms from noisy audio-visual speech via a
transformer-based architecture, and then converts them into waveform audio
using a neural vocoder (HiFi-GAN). We train and evaluate our framework on
thousands of speakers and 11+ different languages, and study our model's
ability to adapt to different levels of background noise and speech
interference. Our experiments show that LA-VocE outperforms existing methods
according to multiple metrics, particularly under very noisy scenarios.Comment: Submitted to ICASSP 202
Subspace Hybrid MVDR Beamforming for Augmented Hearing
Signal-dependent beamformers are advantageous over signal-independent
beamformers when the acoustic scenario - be it real-world or simulated - is
straightforward in terms of the number of sound sources, the ambient sound
field and their dynamics. However, in the context of augmented reality audio
using head-worn microphone arrays, the acoustic scenarios encountered are often
far from straightforward. The design of robust, high-performance, adaptive
beamformers for such scenarios is an on-going challenge. This is due to the
violation of the typically required assumptions on the noise field caused by,
for example, rapid variations resulting from complex acoustic environments,
and/or rotations of the listener's head. This work proposes a multi-channel
speech enhancement algorithm which utilises the adaptability of
signal-dependent beamformers while still benefiting from the computational
efficiency and robust performance of signal-independent super-directive
beamformers. The algorithm has two stages. (i) The first stage is a hybrid
beamformer based on a dictionary of weights corresponding to a set of noise
field models. (ii) The second stage is a wide-band subspace post-filter to
remove any artifacts resulting from (i). The algorithm is evaluated using both
real-world recordings and simulations of a cocktail-party scenario. Noise
suppression, intelligibility and speech quality results show a significant
performance improvement by the proposed algorithm compared to the baseline
super-directive beamformer. A data-driven implementation of the noise field
dictionary is shown to provide more noise suppression, and similar speech
intelligibility and quality, compared to a parametric dictionary.Comment: 14 pages, 10 figures, submitted for IEEE/ACM Transactions on Audio,
Speech, and Language Processing on 23-Nov-202
Wear Testing of a Mechanized Percussion Well Drilling System for Water Access in West Africa
The Mechanized Percussion Well Drilling (MPWD) Collaboratory project is assisting in the development of a mechanized well drilling system for drilling shallow water wells in West Africa. Our client, Mr. Joseph Longenecker with Open Door Development (ODD), desires to make water wells accessible to all in this region, but has experienced difficulty drilling through hard soil layers. To overcome this problem, the MPWD team has worked closely with Mr. Joseph Longenecker to develop a mechanized percussion well drilling rig using a rubber friction wheel drive system that is capable of drilling through these harder layers.
Currently, the MPWD team is working to provide recommendations to improve the useful service life of our client’s new, mechanized rig design. The MPWD team’s most recent work includes the design and fabrication of a testing rig to simulate the operation of our client’s full-size rig. The testing rig will allow our team to conduct fatigue testing on a model of the driveline system to analyze the wear patterns on the rubber friction wheel and to estimate its expected service life. The team has also performed a series of finite element analyses on the mast design of our client\u27s rig to evaluate working stresses under static loading and buckling, along with fatigue analysis, to confirm safe operation of the rig and to identify any elements that might require upgrades.
Funding for this work provided by The Collaboratory for Strategic Partnerships and Applied Research.https://mosaic.messiah.edu/engr2022/1010/thumbnail.jp
Reproduction of Personal Sound in Shared Environments
The experience and utility of personal sound is a highly sought after characteristic of shared spaces. Personal sound allows individuals, or small groups of individuals, to listen to separate streams of audio content without external interruption from a third-party. The desired effects of personal acoustic environments can also be areas of minimal sound, where quiet spaces facilitate an effortless mode of communication. These characteristics have become exceedingly difficult to produce in busy environments such as cafes, restaurants, open plan offices and entertainment venues. The concept of, and the ability to provide, spaces of such nature has been of significant interest to researchers in the past two decades.
This thesis answers open questions in the area of personal sound reproduction using loudspeaker arrays, which is the active reproduction of soundfields over extended spatial regions of interest. We first provide a review of the mathematical foundations of acoustics theory, single zone and multiple zone soundfield reproduction, as well as background on the human perception of sound. We then introduce novel approaches for the integration of psychoacoustic models in multizone soundfield reproductions and describe implementations that facilitate the efficient computation of complex soundfield synthesis. The psychoacoustic based zone weighting is shown to considerably improve soundfield accuracy, as measured by the soundfield error, and the proposed computational methods are shown capable of providing several orders of magnitude better performance with insignificant effects on synthesis quality. Consideration is then given to the enhancement of privacy and quality in personal sound zones and in particular on the effects of unwanted sound leaking between zones. Optimisation algorithms, along with a priori estimations of cascaded zone leakage filters, are then established so as to provide privacy between the sound zones without diminishing quality. Simulations and real-world experiments are performed, using linear and part-circle loudspeaker arrays, to confirm the practical feasibility of the proposed privacy and quality control techniques. The experiments show that good quality and confidential privacy are achievable simultaneously. The concept of personal sound is then extended to the active suppression of speech across loudspeaker boundaries. Novel suppression techniques are derived for linear and planar loudspeaker boundaries, which are then used to simulate the reduction of speech levels over open spaces and suppression of acoustic reflections from walls. The suppression is shown to be as effective as passive fibre panel absorbers. Finally, we propose a novel ultrasonic parametric and electrodynamic loudspeaker hybrid design for acoustic contrast enhancement in multizone reproduction scenarios and show that significant acoustic contrast can be achieved above the fundamental spatial aliasing frequency
Just for you: how to create sounds that only you can hear in a venue
Picture your typical busy cafe or restaurant that\u27s full of people. The diners are usually all forced to listen to the same music that\u27s pumped into the venue via the speakers
Multizone Soundfield Reproduction With Privacy- and Quality-Based Speech Masking Filters
Reproducing zones of personal sound is a challenging signal processing problem which has garnered considerable research interest in recent years. We introduce in this work an extended method to multizone soundfield reproduction which overcomes issues with speech privacy and quality. Measures of Speech Intelligibility Contrast (SIC) and speech quality are used as cost functions in an optimisation of speech privacy and quality. Novel spatial and (temporal) frequency domain speech masker filter designs are proposed to accompany the optimisation process. Spatial masking filters are designed using multizone soundfield algorithms which are dependent on the target speech multizone reproduction. Combinations of estimates of acoustic contrast and long term average speech spectra are proposed to provide equal masking influence on speech privacy and quality. Spatial aliasing specific to multizone soundfield reproduction geometry is further considered in analytically derived low-pass filters. Simulated and real-world experiments are conducted to verify the performance of the proposed method using semi-circular and linear loudspeaker arrays. Simulated implementations of the proposed method show that significant speech intelligibility contrast and speech quality is achievable between zones. A range of Perceptual Evaluation of Speech Quality (PESQ) Mean Opinion Scores (MOS) that indicate good quality are obtained while at the same time providing confidential privacy as indicated by SIC. The simulations also show that the method is robust to variations in the speech, virtual source location, array geometry and number of loudspeakers. Real-world experiments confirm the practicality of the proposed methods by showing that good quality and confidential privacy are achievable