Search CORE

33 research outputs found

Spatially and Temporally Directed Noise Cancellation Using Federated Learning

Author: Goel Shantanu
Itankar Piyush T.
Publication venue: Technical Disclosure Commons
Publication date: 19/03/2020
Field of study

Machine learning models can be trained to cancel noise of diverse types or spectral characteristics, e.g. traffic noise, background chatter, etc. Such models are trained by feeding training data that includes labeled noise waveforms, which is an expensive and time-consuming procedure. Further, the effectiveness of such machine learning models is limited in canceling types of noise absent from training data. Trained models occupy significant amounts of memory which limits their use in consumer devices. This disclosure describes the use of federated learning techniques to train noise canceling models locally at diverse device locations and times. With user permission, the trained models are tagged with timestamp and location, such that when a user device has time or location matching a particular noise cancellation model, the particular model is provided to the user device. Noise cancellation on the user device is then performed with a compact machine learning model that is suited to the time and location of the user device

Technical Disclosure Common

Cross-lingual transferability of voice analysis models: a Parkinson’s Disease case study

Author: C. Ferrante
V. Scotti
Publication venue: country:ITA
Publication date: 01/01/2023
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Noise Suppression Based on RNN with a DBSCAN Classifier for Speech Enhancement

Author: Sui Mingfei
Publication venue: 法政大学大学院情報科学研究科
Publication date: 31/03/2019
Field of study

In the field of the noise suppression, method combining hardware devices with DSP chips is widely used to suppress noise and can achieve excellent performance when the cost of devices is not limited and the recording site could be contacted. Besides, deep learning is applied to process audio and image gradually with the rapid development of deep learning and many algorithms for noise suppression using deep learning arisen. These algorithms are not relied on hardware devices any more but they need lots of training data which is crucial for the performance. Therefore, a noise suppression method is proposed with good generalization. Firstly, a classifier based DBSCAN is implemented and identify the proportion of various noise according to the MFCC characteristic. Then for each noise, a 5-layer RNN is used to estimate gain. Finally, applying gain and the proportional which is obtained by the classifier to the corresponding frequency bands in order to eliminate noise

Institutional Repositories DataBase (IRDB)

Hosei University Repository

Deep speech inpainting of time-frequency masks

Author: Beckmann Pierre
Cernak Milos
Kegler Mikolaj
Publication venue: 'International Speech Communication Association'
Publication date: 29/08/2020
Field of study

Transient loud intrusions, often occurring in noisy environments, can completely overpower speech signal and lead to an inevitable loss of information. While existing algorithms for noise suppression can yield impressive results, their efficacy remains limited for very low signal-to-noise ratios or when parts of the signal are missing. To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech. The framework is based on a convolutional U-Net trained via deep feature losses, obtained using speechVGG, a deep speech feature extractor pre-trained on an auxiliary word classification task. Our evaluation results demonstrate that the proposed framework can recover large portions of missing or distorted time-frequency representation of speech, up to 400 ms and 3.2 kHz in bandwidth. In particular, our approach provided a substantial increase in STOI & PESQ objective metrics of the initially corrupted speech samples. Notably, using deep feature losses to train the framework led to the best results, as compared to conventional approaches.Comment: Accepted to InterSpeech202

arXiv.org e-Print Archive

Crossref