3,348 research outputs found
Bitwise Source Separation on Hashed Spectra: An Efficient Posterior Estimation Scheme Using Partial Rank Order Metrics
This paper proposes an efficient bitwise solution to the single-channel
source separation task. Most dictionary-based source separation algorithms rely
on iterative update rules during the run time, which becomes computationally
costly especially when we employ an overcomplete dictionary and sparse encoding
that tend to give better separation results. To avoid such cost we propose a
bitwise scheme on hashed spectra that leads to an efficient posterior
probability calculation. For each source, the algorithm uses a partial rank
order metric to extract robust features that form a binarized dictionary of
hashed spectra. Then, for a mixture spectrum, its hash code is compared with
each source's hashed dictionary in one pass. This simple voting-based
dictionary search allows a fast and iteration-free estimation of ratio masking
at each bin of a signal spectrogram. We verify that the proposed BitWise Source
Separation (BWSS) algorithm produces sensible source separation results for the
single-channel speech denoising task, with 6-8 dB mean SDR. To our knowledge,
this is the first dictionary based algorithm for this task that is completely
iteration-free in both training and testing
Learning to Separate Voices by Spatial Regions
We consider the problem of audio voice separation for binaural applications,
such as earphones and hearing aids. While today's neural networks perform
remarkably well (separating sources with 2 microphones) they assume a
known or fixed maximum number of sources, K. Moreover, today's models are
trained in a supervised manner, using training data synthesized from generic
sources, environments, and human head shapes.
This paper intends to relax both these constraints at the expense of a slight
alteration in the problem definition. We observe that, when a received mixture
contains too many sources, it is still helpful to separate them by region,
i.e., isolating signal mixtures from each conical sector around the user's
head. This requires learning the fine-grained spatial properties of each
region, including the signal distortions imposed by a person's head. We propose
a two-stage self-supervised framework in which overheard voices from earphones
are pre-processed to extract relatively clean personalized signals, which are
then used to train a region-wise separation model. Results show promising
performance, underscoring the importance of personalization over a generic
supervised approach. (audio samples available at our project website:
https://uiuc-earable-computing.github.io/binaural/. We believe this result
could help real-world applications in selective hearing, noise cancellation,
and audio augmented reality.Comment: Accepted to ICML 2022. For associated audio samples, see
https://uiuc-earable-computing.github.io/binaura
Collaborative Deep Learning for Speech Enhancement: A Run-Time Model Selection Method Using Autoencoders
We show that a Modular Neural Network (MNN) can combine various speech
enhancement modules, each of which is a Deep Neural Network (DNN) specialized
on a particular enhancement job. Differently from an ordinary ensemble
technique that averages variations in models, the propose MNN selects the best
module for the unseen test signal to produce a greedy ensemble. We see this as
Collaborative Deep Learning (CDL), because it can reuse various already-trained
DNN models without any further refining. In the proposed MNN selecting the best
module during run time is challenging. To this end, we employ a speech
AutoEncoder (AE) as an arbitrator, whose input and output are trained to be as
similar as possible if its input is clean speech. Therefore, the AE can gauge
the quality of the module-specific denoised result by seeing its AE
reconstruction error, e.g. low error means that the module output is similar to
clean speech. We propose an MNN structure with various modules that are
specialized on dealing with a specific noise type, gender, and input
Signal-to-Noise Ratio (SNR) value, and empirically prove that it almost always
works better than an arbitrarily chosen DNN module and sometimes as good as an
oracle result
- …