176 research outputs found
A Hybrid SFANC-FxNLMS Algorithm for Active Noise Control based on Deep Learning
The selective fixed-filter active noise control (SFANC) method selecting the
best pre-trained control filters for various types of noise can achieve a fast
response time. However, it may lead to large steady-state errors due to
inaccurate filter selection and the lack of adaptability. In comparison, the
filtered-X normalized least-mean-square (FxNLMS) algorithm can obtain lower
steady-state errors through adaptive optimization. Nonetheless, its slow
convergence has a detrimental effect on dynamic noise attenuation. Therefore,
this paper proposes a hybrid SFANC-FxNLMS approach to overcome the adaptive
algorithm's slow convergence and provide a better noise reduction level than
the SFANC method. A lightweight one-dimensional convolutional neural network
(1D CNN) is designed to automatically select the most suitable pre-trained
control filter for each frame of the primary noise. Meanwhile, the FxNLMS
algorithm continues to update the coefficients of the chosen pre-trained
control filter at the sampling rate. Owing to the effective combination of the
two algorithms, experimental results show that the hybrid SFANC-FxNLMS
algorithm can achieve a rapid response time, a low noise reduction error, and a
high degree of robustness
Partially Randomizing Transformer Weights for Dialogue Response Diversity
Despite recent progress in generative open-domain dialogue, the issue of low
response diversity persists. Prior works have addressed this issue via either
novel objective functions, alternative learning approaches such as variational
frameworks, or architectural extensions such as the Randomized Link (RL)
Transformer. However, these approaches typically entail either additional
difficulties during training/inference, or a significant increase in model size
and complexity. Hence, we propose the \underline{Pa}rtially
\underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension
of the transformer which involves freezing the weights of selected layers after
random initialization. Experimental results reveal that the performance of the
PaRaformer is comparable to that of the aforementioned approaches, despite not
entailing any additional training difficulty or increase in model complexity
Active Noise Control in The New Century: The Role and Prospect of Signal Processing
Since Paul Leug's 1933 patent application for a system for the active control
of sound, the field of active noise control (ANC) has not flourished until the
advent of digital signal processors forty years ago. Early theoretical
advancements in digital signal processing and processors laid the groundwork
for the phenomenal growth of the field, particularly over the past
quarter-century. The widespread commercial success of ANC in aircraft cabins,
automobile cabins, and headsets demonstrates the immeasurable public health and
economic benefits of ANC. This article continues where Elliott and Nelson's
1993 Signal Processing Magazine article and Elliott's 1997 50th anniversary
commentary~\cite{kahrs1997past} on ANC left off, tracing the technical
developments and applications in ANC spurred by the seminal texts of Nelson and
Elliott (1991), Kuo and Morgan (1996), Hansen and Snyder (1996), and Elliott
(2001) since the turn of the century. This article focuses on technical
developments pertaining to real-world implementations, such as improving
algorithmic convergence, reducing system latency, and extending control to
non-stationary and/or broadband noise, as well as the commercial transition
challenges from analog to digital ANC systems. Finally, open issues and the
future of ANC in the era of artificial intelligence are discussed.Comment: Inter-Noise 202
An Empirical Bayes Framework for Open-Domain Dialogue Generation
To engage human users in meaningful conversation, open-domain dialogue agents
are required to generate diverse and contextually coherent dialogue. Despite
recent advancements, which can be attributed to the usage of pretrained
language models, the generation of diverse and coherent dialogue remains an
open research problem. A popular approach to address this issue involves the
adaptation of variational frameworks. However, while these approaches
successfully improve diversity, they tend to compromise on contextual
coherence. Hence, we propose the Bayesian Open-domain Dialogue with Empirical
Bayes (BODEB) framework, an empirical bayes framework for constructing an
Bayesian open-domain dialogue agent by leveraging pretrained parameters to
inform the prior and posterior parameter distributions. Empirical results show
that BODEB achieves better results in terms of both diversity and coherence
compared to variational frameworks
A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
Polyphonic sound event detection and direction-of-arrival estimation require
different input features from audio signals. While sound event detection mainly
relies on time-frequency patterns, direction-of-arrival estimation relies on
magnitude or phase differences between microphones. Previous approaches use the
same input features for sound event detection and direction-of-arrival
estimation, and train the two tasks jointly or in a two-stage transfer-learning
manner. We propose a two-step approach that decouples the learning of the sound
event detection and directional-of-arrival estimation systems. In the first
step, we detect the sound events and estimate the directions-of-arrival
separately to optimize the performance of each system. In the second step, we
train a deep neural network to match the two output sequences of the event
detector and the direction-of-arrival estimator. This modular and hierarchical
approach allows the flexibility in the system design, and increase the
performance of the whole sound event localization and detection system. The
experimental results using the DCASE 2019 sound event localization and
detection dataset show an improved performance compared to the previous
state-of-the-art solutions.Comment: to be published in 2020 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP
Deep Generative Fixed-filter Active Noise Control
Due to the slow convergence and poor tracking ability, conventional LMS-based
adaptive algorithms are less capable of handling dynamic noises. Selective
fixed-filter active noise control (SFANC) can significantly reduce response
time by selecting appropriate pre-trained control filters for different noises.
Nonetheless, the limited number of pre-trained control filters may affect noise
reduction performance, especially when the incoming noise differs much from the
initial noises during pre-training. Therefore, a generative fixed-filter active
noise control (GFANC) method is proposed in this paper to overcome the
limitation. Based on deep learning and a perfect-reconstruction filter bank,
the GFANC method only requires a few prior data (one pre-trained broadband
control filter) to automatically generate suitable control filters for various
noises. The efficacy of the GFANC method is demonstrated by numerical
simulations on real-recorded noises.Comment: Accepted by ICASSP 2023. Code will be available after publicatio
On the preprocessing and postprocessing of HRTF individualization based on sparse representation of anthropometric features
Individualization of head-related transfer functions (HRTFs) can be realized using the person's anthropometry with a pretrained model. This model usually establishes a direct linear or non-linear mapping from anthropometry to HRTFs in the training database. Due to the complex relation between anthropometry and HRTFs, the accuracy of this model depends heavily on the correct selection of the anthropometric features. To alleviate this problem and improve the accuracy of HRTF individualization, an indirect HRTF individualization framework was proposed recently, where HRTFs are synthesized using a sparse representation trained from the anthropometric features. In this paper, we extend their study on this framework by investigating the effects of different preprocessing and postprocessing methods on HRTF individualization. Our experimental results showed that preprocessing and postprocessing methods are crucial for achieving accurate HRTF individualization
Assessment of a cost-effective headphone calibration procedure for soundscape evaluations
To increase the availability and adoption of the soundscape standard, a
low-cost calibration procedure for reproduction of audio stimuli over
headphones was proposed as part of the global ``Soundscape Attributes
Translation Project'' (SATP) for validating ISO/TS~12913-2:2018 perceived
affective quality (PAQ) attribute translations. A previous preliminary study
revealed significant deviations from the intended equivalent continuous
A-weighted sound pressure levels () using the open-circuit
voltage (OCV) calibration procedure. For a more holistic human-centric
perspective, the OCV method is further investigated here in terms of
psychoacoustic parameters, including relevant exceedance levels to account for
temporal effects on the same 27 stimuli from the SATP. Moreover, a
within-subjects experiment with 36 participants was conducted to examine the
effects of OCV calibration on the PAQ attributes in ISO/TS~12913-2:2018.
Bland-Altman analysis of the objective indicators revealed large biases in the
OCV method across all weighted sound level and loudness indicators; and
roughness indicators at \SI{5}{\%} and \SI{10}{\%} exceedance levels.
Significant perceptual differences due to the OCV method were observed in about
\SI{20}{\%} of the stimuli, which did not correspond clearly with the biased
acoustic indicators. A cautioned interpretation of the objective and perceptual
differences due to small and unpaired samples nevertheless provide grounds for
further investigation.Comment: For 24th International Congress on Acoustic
Do uHear? Validation of uHear App for Preliminary Screening of Hearing Ability in Soundscape Studies
Studies involving soundscape perception often exclude participants with
hearing loss to prevent impaired perception from affecting experimental
results. Participants are typically screened with pure tone audiometry, the
"gold standard" for identifying and quantifying hearing loss at specific
frequencies, and excluded if a study-dependent threshold is not met. However,
procuring professional audiometric equipment for soundscape studies may be
cost-ineffective, and manually performing audiometric tests is
labour-intensive. Moreover, testing requirements for soundscape studies may not
require sensitivities and specificities as high as that in a medical diagnosis
setting. Hence, in this study, we investigate the effectiveness of the uHear
app, an iOS application, as an affordable and automatic alternative to a
conventional audiometer in screening participants for hearing loss for the
purpose of soundscape studies or listening tests in general. Based on
audiometric comparisons with the audiometer of 163 participants, the uHear app
was found to have high precision (98.04%) when using the World Health
Organization (WHO) grading scheme for assessing normal hearing. Precision is
further improved (98.69%) when all frequencies assessed with the uHear app is
considered in the grading, which lends further support to this cost-effective,
automated alternative to screen for normal hearing.Comment: Full paper submitted to 24th International Congress on Acoustic
Crossing the Linguistic Causeway: Ethnonational Differences on Soundscape Attributes in Bahasa Melayu
Despite being neighbouring countries and sharing the language of Bahasa
Melayu (ISO 639-3:ZSM), cultural and language education policy differences
between Singapore and Malaysia led to differences in the translation of the
"annoying" perceived affective quality (PAQ) attribute from English (ISO
639-3:ENG) to ZSM. This study expands upon the translation of the PAQ
attributes from eng to ZSM in Stage 1 of the Soundscapes Attributes Translation
Project (SATP) initiative, and presents the findings of Stage 2 listening tests
that investigated ethnonational differences in the translated ZSM PAQ
attributes and explored their circumplexity. A cross-cultural listening test
was conducted with 100 ZSM speakers from Malaysia and Singapore using the
common SATP protocol. The analysis revealed that Malaysian participants from
non-native ethnicities (my:o) showed PAQ perceptions more similar to Singapore
(sg) participants than native ethnic Malays (MY:M) in Malaysia. Differences
between Singapore and Malaysian groups were primarily observed in stimuli
related to water features, reflecting cultural and geographical variations.
Besides variations in water source-dominant stimuli perception, disparities
between MY:M and SG could be mainly attributed to vibrant scores. The findings
also suggest that the adoption of region-specific translations, such as
membingitkan in Singapore and menjengkelkan in Malaysia, adequately addressed
differences in the annoying attribute, as significant differences were observed
in one or fewer stimuli across ethnonational groups The circumplexity analysis
indicated that the quasi-circumplex model better fit the data compared to the
assumed equal angle quasi-circumplex model in ISO/TS 12913-3, although
deviations were observed possibly due to respondents' unfamiliarity with the
United Kingdom-centric context of the stimulus dataset...Comment: Preprint submitted to Elsevier for revie
- …