176 research outputs found

    A Hybrid SFANC-FxNLMS Algorithm for Active Noise Control based on Deep Learning

    Full text link
    The selective fixed-filter active noise control (SFANC) method selecting the best pre-trained control filters for various types of noise can achieve a fast response time. However, it may lead to large steady-state errors due to inaccurate filter selection and the lack of adaptability. In comparison, the filtered-X normalized least-mean-square (FxNLMS) algorithm can obtain lower steady-state errors through adaptive optimization. Nonetheless, its slow convergence has a detrimental effect on dynamic noise attenuation. Therefore, this paper proposes a hybrid SFANC-FxNLMS approach to overcome the adaptive algorithm's slow convergence and provide a better noise reduction level than the SFANC method. A lightweight one-dimensional convolutional neural network (1D CNN) is designed to automatically select the most suitable pre-trained control filter for each frame of the primary noise. Meanwhile, the FxNLMS algorithm continues to update the coefficients of the chosen pre-trained control filter at the sampling rate. Owing to the effective combination of the two algorithms, experimental results show that the hybrid SFANC-FxNLMS algorithm can achieve a rapid response time, a low noise reduction error, and a high degree of robustness

    Partially Randomizing Transformer Weights for Dialogue Response Diversity

    Full text link
    Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during training/inference, or a significant increase in model size and complexity. Hence, we propose the \underline{Pa}rtially \underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension of the transformer which involves freezing the weights of selected layers after random initialization. Experimental results reveal that the performance of the PaRaformer is comparable to that of the aforementioned approaches, despite not entailing any additional training difficulty or increase in model complexity

    Active Noise Control in The New Century: The Role and Prospect of Signal Processing

    Full text link
    Since Paul Leug's 1933 patent application for a system for the active control of sound, the field of active noise control (ANC) has not flourished until the advent of digital signal processors forty years ago. Early theoretical advancements in digital signal processing and processors laid the groundwork for the phenomenal growth of the field, particularly over the past quarter-century. The widespread commercial success of ANC in aircraft cabins, automobile cabins, and headsets demonstrates the immeasurable public health and economic benefits of ANC. This article continues where Elliott and Nelson's 1993 Signal Processing Magazine article and Elliott's 1997 50th anniversary commentary~\cite{kahrs1997past} on ANC left off, tracing the technical developments and applications in ANC spurred by the seminal texts of Nelson and Elliott (1991), Kuo and Morgan (1996), Hansen and Snyder (1996), and Elliott (2001) since the turn of the century. This article focuses on technical developments pertaining to real-world implementations, such as improving algorithmic convergence, reducing system latency, and extending control to non-stationary and/or broadband noise, as well as the commercial transition challenges from analog to digital ANC systems. Finally, open issues and the future of ANC in the era of artificial intelligence are discussed.Comment: Inter-Noise 202

    An Empirical Bayes Framework for Open-Domain Dialogue Generation

    Full text link
    To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue. Despite recent advancements, which can be attributed to the usage of pretrained language models, the generation of diverse and coherent dialogue remains an open research problem. A popular approach to address this issue involves the adaptation of variational frameworks. However, while these approaches successfully improve diversity, they tend to compromise on contextual coherence. Hence, we propose the Bayesian Open-domain Dialogue with Empirical Bayes (BODEB) framework, an empirical bayes framework for constructing an Bayesian open-domain dialogue agent by leveraging pretrained parameters to inform the prior and posterior parameter distributions. Empirical results show that BODEB achieves better results in terms of both diversity and coherence compared to variational frameworks

    A Sequence Matching Network for Polyphonic Sound Event Localization and Detection

    Full text link
    Polyphonic sound event detection and direction-of-arrival estimation require different input features from audio signals. While sound event detection mainly relies on time-frequency patterns, direction-of-arrival estimation relies on magnitude or phase differences between microphones. Previous approaches use the same input features for sound event detection and direction-of-arrival estimation, and train the two tasks jointly or in a two-stage transfer-learning manner. We propose a two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems. In the first step, we detect the sound events and estimate the directions-of-arrival separately to optimize the performance of each system. In the second step, we train a deep neural network to match the two output sequences of the event detector and the direction-of-arrival estimator. This modular and hierarchical approach allows the flexibility in the system design, and increase the performance of the whole sound event localization and detection system. The experimental results using the DCASE 2019 sound event localization and detection dataset show an improved performance compared to the previous state-of-the-art solutions.Comment: to be published in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

    Deep Generative Fixed-filter Active Noise Control

    Full text link
    Due to the slow convergence and poor tracking ability, conventional LMS-based adaptive algorithms are less capable of handling dynamic noises. Selective fixed-filter active noise control (SFANC) can significantly reduce response time by selecting appropriate pre-trained control filters for different noises. Nonetheless, the limited number of pre-trained control filters may affect noise reduction performance, especially when the incoming noise differs much from the initial noises during pre-training. Therefore, a generative fixed-filter active noise control (GFANC) method is proposed in this paper to overcome the limitation. Based on deep learning and a perfect-reconstruction filter bank, the GFANC method only requires a few prior data (one pre-trained broadband control filter) to automatically generate suitable control filters for various noises. The efficacy of the GFANC method is demonstrated by numerical simulations on real-recorded noises.Comment: Accepted by ICASSP 2023. Code will be available after publicatio

    On the preprocessing and postprocessing of HRTF individualization based on sparse representation of anthropometric features

    Get PDF
    Individualization of head-related transfer functions (HRTFs) can be realized using the person's anthropometry with a pretrained model. This model usually establishes a direct linear or non-linear mapping from anthropometry to HRTFs in the training database. Due to the complex relation between anthropometry and HRTFs, the accuracy of this model depends heavily on the correct selection of the anthropometric features. To alleviate this problem and improve the accuracy of HRTF individualization, an indirect HRTF individualization framework was proposed recently, where HRTFs are synthesized using a sparse representation trained from the anthropometric features. In this paper, we extend their study on this framework by investigating the effects of different preprocessing and postprocessing methods on HRTF individualization. Our experimental results showed that preprocessing and postprocessing methods are crucial for achieving accurate HRTF individualization

    Assessment of a cost-effective headphone calibration procedure for soundscape evaluations

    Full text link
    To increase the availability and adoption of the soundscape standard, a low-cost calibration procedure for reproduction of audio stimuli over headphones was proposed as part of the global ``Soundscape Attributes Translation Project'' (SATP) for validating ISO/TS~12913-2:2018 perceived affective quality (PAQ) attribute translations. A previous preliminary study revealed significant deviations from the intended equivalent continuous A-weighted sound pressure levels (LA,eqL_{\text{A,eq}}) using the open-circuit voltage (OCV) calibration procedure. For a more holistic human-centric perspective, the OCV method is further investigated here in terms of psychoacoustic parameters, including relevant exceedance levels to account for temporal effects on the same 27 stimuli from the SATP. Moreover, a within-subjects experiment with 36 participants was conducted to examine the effects of OCV calibration on the PAQ attributes in ISO/TS~12913-2:2018. Bland-Altman analysis of the objective indicators revealed large biases in the OCV method across all weighted sound level and loudness indicators; and roughness indicators at \SI{5}{\%} and \SI{10}{\%} exceedance levels. Significant perceptual differences due to the OCV method were observed in about \SI{20}{\%} of the stimuli, which did not correspond clearly with the biased acoustic indicators. A cautioned interpretation of the objective and perceptual differences due to small and unpaired samples nevertheless provide grounds for further investigation.Comment: For 24th International Congress on Acoustic

    Do uHear? Validation of uHear App for Preliminary Screening of Hearing Ability in Soundscape Studies

    Full text link
    Studies involving soundscape perception often exclude participants with hearing loss to prevent impaired perception from affecting experimental results. Participants are typically screened with pure tone audiometry, the "gold standard" for identifying and quantifying hearing loss at specific frequencies, and excluded if a study-dependent threshold is not met. However, procuring professional audiometric equipment for soundscape studies may be cost-ineffective, and manually performing audiometric tests is labour-intensive. Moreover, testing requirements for soundscape studies may not require sensitivities and specificities as high as that in a medical diagnosis setting. Hence, in this study, we investigate the effectiveness of the uHear app, an iOS application, as an affordable and automatic alternative to a conventional audiometer in screening participants for hearing loss for the purpose of soundscape studies or listening tests in general. Based on audiometric comparisons with the audiometer of 163 participants, the uHear app was found to have high precision (98.04%) when using the World Health Organization (WHO) grading scheme for assessing normal hearing. Precision is further improved (98.69%) when all frequencies assessed with the uHear app is considered in the grading, which lends further support to this cost-effective, automated alternative to screen for normal hearing.Comment: Full paper submitted to 24th International Congress on Acoustic

    Crossing the Linguistic Causeway: Ethnonational Differences on Soundscape Attributes in Bahasa Melayu

    Full text link
    Despite being neighbouring countries and sharing the language of Bahasa Melayu (ISO 639-3:ZSM), cultural and language education policy differences between Singapore and Malaysia led to differences in the translation of the "annoying" perceived affective quality (PAQ) attribute from English (ISO 639-3:ENG) to ZSM. This study expands upon the translation of the PAQ attributes from eng to ZSM in Stage 1 of the Soundscapes Attributes Translation Project (SATP) initiative, and presents the findings of Stage 2 listening tests that investigated ethnonational differences in the translated ZSM PAQ attributes and explored their circumplexity. A cross-cultural listening test was conducted with 100 ZSM speakers from Malaysia and Singapore using the common SATP protocol. The analysis revealed that Malaysian participants from non-native ethnicities (my:o) showed PAQ perceptions more similar to Singapore (sg) participants than native ethnic Malays (MY:M) in Malaysia. Differences between Singapore and Malaysian groups were primarily observed in stimuli related to water features, reflecting cultural and geographical variations. Besides variations in water source-dominant stimuli perception, disparities between MY:M and SG could be mainly attributed to vibrant scores. The findings also suggest that the adoption of region-specific translations, such as membingitkan in Singapore and menjengkelkan in Malaysia, adequately addressed differences in the annoying attribute, as significant differences were observed in one or fewer stimuli across ethnonational groups The circumplexity analysis indicated that the quasi-circumplex model better fit the data compared to the assumed equal angle quasi-circumplex model in ISO/TS 12913-3, although deviations were observed possibly due to respondents' unfamiliarity with the United Kingdom-centric context of the stimulus dataset...Comment: Preprint submitted to Elsevier for revie
    • …
    corecore