23 research outputs found

    Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders

    Get PDF
    Supervised multi-channel audio source separation requires extracting useful spectral, temporal, and spatial features from the mixed signals. The success of many existing systems is therefore largely dependent on the choice of features used for training. In this work, we introduce a novel multi-channel, multi-resolution convolutional auto-encoder neural network that works on raw time-domain signals to determine appropriate multi-resolution features for separating the singing-voice from stereo music. Our experimental results show that the proposed method can achieve multi-channel audio source separation without the need for hand-crafted features or any pre- or post-processing

    Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation

    Get PDF
    In deep neural networks with convolutional layers, each layer typically has fixed-size/single-resolution receptive field (RF). Convolutional layers with a large RF capture global information from the input features, while layers with small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural networks (MR-FCNN), where each layer has different RF sizes to extract multi-resolution features that capture the global and local details information from its input features. The proposed MR-FCNN is applied to separate a target audio source from a mixture of many audio sources. Experimental results show that using MR-FCNN improves the performance compared to feedforward deep neural networks (DNNs) and single resolution deep fully convolutional neural networks (FCNNs) on the audio source separation problem.Comment: arXiv admin note: text overlap with arXiv:1703.0801

    Contributions and limitations of using machine learning to predict noise-induced hearing loss

    Get PDF
    Purpose Noise-induced hearing loss (NIHL) is a global issue that impacts people’s life and health. The current review aims to clarify the contributions and limitations of applying machine learning (ML) to predict NIHL by analyzing the performance of different ML techniques and the procedure of model construction. Methods The authors searched PubMed, EMBASE and Scopus on November 26, 2020. Results Eight studies were recruited in the current review following defined inclusion and exclusion criteria. Sample size in the selected studies ranged between 150 and 10,567. The most popular models were artificial neural networks (n = 4), random forests (n = 3) and support vector machines (n = 3). Features mostly correlated with NIHL and used in the models were: age (n = 6), duration of noise exposure (n = 5) and noise exposure level (n = 4). Five included studies used either split-sample validation (n = 3) or ten-fold cross-validation (n = 2). Assessment of accuracy ranged in value from 75.3% to 99% with a low prediction error/root-mean-square error in 3 studies. Only 2 studies measured discrimination risk using the receiver operating characteristic (ROC) curve and/or the area under ROC curve. Conclusion In spite of high accuracy and low prediction error of machine learning models, some improvement can be expected from larger sample sizes, multiple algorithm use, completed reports of model construction and the sufficient evaluation of calibration and discrimination risk

    Machine learning in diagnosing middle ear disorders using tympanic membrane images : a meta-analysis

    Get PDF
    OBJECTIVE : To systematically evaluate the development of Machine Learning (ML) models and compare their diagnostic accuracy for the classification of Middle Ear Disorders (MED) using Tympanic Membrane (TM) images. METHODS : PubMed, EMBASE, CINAHL, and CENTRAL were searched up until November 30, 2021. Studies on the development of ML approaches for diagnosing MED using TM images were selected according to the inclusion criteria. PRISMA guidelines were followed with study design, analysis method, and outcomes extracted. Sensitivity, specificity, and area under the curve (AUC) were used to summarize the performance metrics of the meta-analysis. Risk of Bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool in combination with the Prediction Model Risk of Bias Assessment Tool. RESULTS : Sixteen studies were included, encompassing 20254 TM images (7025 normal TM and 13229 MED). The sample size ranged from 45 to 6066 per study. The accuracy of the 25 included ML approaches ranged from 76.00% to 98.26%. Eleven studies (68.8%) were rated as having a low risk of bias, with the reference standard as the major domain of high risk of bias (37.5%). Sensitivity and specificity were 93% (95% CI, 90%–95%) and 85% (95% CI, 82%–88%), respectively. The AUC of total TM images was 94% (95% CI, 91%–96%). The greater AUC was found using otoendoscopic images than otoscopic images. CONCLUSIONS : ML approaches perform robustly in distinguishing between normal ears and MED, however, it is proposed that a standardized TM image acquisition and annotation protocol should be developed.NIHR, Sêr Cymru III Enhancing Competitiveness Infrastructure Award, Great Britain Sasakawa Foundation, Cardiff Metropolitan University Research Innovation Award, and The Global Academies Research and Innovation Development Fund, National Natural Science Foundation of China, Guizhou Provincial Science and Technology Projects and Global Academies and Santandar 2021 Fellowship Award.https://onlinelibrary.wiley.com/journal/15314995am2024Electrical, Electronic and Computer EngineeringSpeech-Language Pathology and AudiologySDG-03:Good heatlh and well-beingSDG-09: Industry, innovation and infrastructur

    Gaussian mixture gain priors for regularized nonnegative matrix factorization in single-channel source separation

    No full text
    We propose a new method to incorporate statistical priors on the solution of the nonnegative matrix factorization (NMF) for single-channel source separation (SCSS) applications. The Gaussian mixture model (GMM) is used as a log-normalized gain prior model for the NMF solution. The normalization makes the prior models energy independent. In NMF based SCSS, NMF is used to decompose the spectra of the observed mixed signal as a weighted linear combination of a set of trained basis vectors. In this work, the NMF decomposition weights are enforced to consider statistical prior information on the weight combination patterns that the trained basis vectors can jointly receive for each source in the observed mixed signal. The NMF solutions for the weights are encouraged to increase the loglikelihood with the trained gain prior GMMs while reducing the NMF reconstruction error at the same time

    Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation

    No full text
    Abstract This paper introduces a speaker adaptation algorithm for nonnegative matrix factorization (NMF) models. The proposed adaptation algorithm is a combination of Bayesian and subspace model adaptation. The adapted model is used to separate speech signal from a background music signal in a single record. Training speech data for multiple speakers is used with NMF to train a set of basis vectors as a general model for speech signals. The probabilistic interpretation of NMF is used to achieve Bayesian adaptation to adjust the general model with respect to the actual properties of the speech signals that is observed in the mixed signal. The Bayesian adapted model is adapted again by a linear transform, which changes the subspace that the Bayesian adapted model spans to better match the speech signal that is in the mixed signal. The experimental results show that combining Bayesian with linear transform adaptation improves the separation results

    Combining Fully Convolutional and Recurrent Neural Networks for Single Channel Audio Source Separation

    Get PDF
    Combining different models is a common strategy to build a good audio source separation system. In this work, we combine two powerful deep neural networks for audio single channel source separation (SCSS). Namely, we combine fully convolutional neural networks (FCNs) and recurrent neural networks, specifically, bidirectional long short-term memory recurrent neural networks (BLSTMs). FCNs are good at extracting useful features from the audio data and BLSTMs are good at modeling the temporal structure of the audio signals. Our experimental results show that combining FCNs and BLSTMs achieves better separation performance than using each model individually

    Single Channel Audio Source Separation using Convolutional Denoising Autoencoders

    Get PDF
    Deep learning techniques have been used recently to tackle the audio source separation problem. In this work, we propose to use deep fully convolutional denoising autoencoders (CDAEs) for monaural audio source separation. We use as many CDAEs as the number of sources to be separated from the mixed signal. Each CDAE is trained to separate one source and treats the other sources as background noise. The main idea is to allow each CDAE to learn suitable spectral-temporal filters and features to its corresponding source. Our experimental results show that CDAEs perform source separation slightly better than the deep feedforward neural networks (FNNs) even with fewer parameters than FNNs

    Remixing musical audio on the web using source separation

    Get PDF
    Presented at the 2nd Web Audio Conference (WAC), April 4-6, 2016, Atlanta, Georgia.Research in audio source separation has progressed a long way, producing systems that are able to approximate the component signals of sound mixtures. In recent years, many efforts have focused on learning time-frequency masks that can be used to filter a monophonic signal in the frequency domain. Using current web audio technologies, time-frequency masking can be implemented in a web browser in real time. This allows applying source separation techniques to arbitrary audio streams, such as internet radios, depending on cross-domain security configurations. While producing good quality separated audio from monophonic music mixtures is still challenging, current methods can be applied to remixing scenarios, where part of the signal is emphasized or deemphasized. This paper describes a system for remixing musical audio on the web by applying time-frequency masks estimated using deep neural networks. Our example prototype, implemented in client-side Javascript, provides reasonable quality results for small modifications
    corecore