33 research outputs found

    Spatially and Temporally Directed Noise Cancellation Using Federated Learning

    Get PDF
    Machine learning models can be trained to cancel noise of diverse types or spectral characteristics, e.g. traffic noise, background chatter, etc. Such models are trained by feeding training data that includes labeled noise waveforms, which is an expensive and time-consuming procedure. Further, the effectiveness of such machine learning models is limited in canceling types of noise absent from training data. Trained models occupy significant amounts of memory which limits their use in consumer devices. This disclosure describes the use of federated learning techniques to train noise canceling models locally at diverse device locations and times. With user permission, the trained models are tagged with timestamp and location, such that when a user device has time or location matching a particular noise cancellation model, the particular model is provided to the user device. Noise cancellation on the user device is then performed with a compact machine learning model that is suited to the time and location of the user device

    Noise Suppression Based on RNN with a DBSCAN Classifier for Speech Enhancement

    Get PDF
    In the field of the noise suppression, method combining hardware devices with DSP chips is widely used to suppress noise and can achieve excellent performance when the cost of devices is not limited and the recording site could be contacted. Besides, deep learning is applied to process audio and image gradually with the rapid development of deep learning and many algorithms for noise suppression using deep learning arisen. These algorithms are not relied on hardware devices any more but they need lots of training data which is crucial for the performance. Therefore, a noise suppression method is proposed with good generalization. Firstly, a classifier based DBSCAN is implemented and identify the proportion of various noise according to the MFCC characteristic. Then for each noise, a 5-layer RNN is used to estimate gain. Finally, applying gain and the proportional which is obtained by the classifier to the corresponding frequency bands in order to eliminate noise

    Deep speech inpainting of time-frequency masks

    Full text link
    Transient loud intrusions, often occurring in noisy environments, can completely overpower speech signal and lead to an inevitable loss of information. While existing algorithms for noise suppression can yield impressive results, their efficacy remains limited for very low signal-to-noise ratios or when parts of the signal are missing. To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech. The framework is based on a convolutional U-Net trained via deep feature losses, obtained using speechVGG, a deep speech feature extractor pre-trained on an auxiliary word classification task. Our evaluation results demonstrate that the proposed framework can recover large portions of missing or distorted time-frequency representation of speech, up to 400 ms and 3.2 kHz in bandwidth. In particular, our approach provided a substantial increase in STOI & PESQ objective metrics of the initially corrupted speech samples. Notably, using deep feature losses to train the framework led to the best results, as compared to conventional approaches.Comment: Accepted to InterSpeech202
    corecore