64 research outputs found
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
In many situations, we would like to hear desired sound events (SEs) while
being able to ignore interference. Target sound extraction (TSE) aims at
tackling this problem by estimating the sound of target SE classes in a mixture
while suppressing all other sounds. We can achieve this with a neural network
that extracts the target SEs by conditioning it on clues representing the
target SE classes. Two types of clues have been proposed, i.e., target SE class
labels and enrollment sound samples similar to the target sound. Systems based
on SE class labels can directly optimize embedding vectors representing the SE
classes, resulting in high extraction performance. However, extending these
systems to the extraction of new SE classes not encountered during training is
not easy. Enrollment-based approaches extract SEs by finding sounds in the
mixtures that share similar characteristics to the enrollment. These approaches
do not explicitly rely on SE class definitions and can thus handle new SE
classes. In this paper, we introduce a TSE framework, SoundBeam, that combines
the advantages of both approaches. We also perform an extensive evaluation of
the different TSE schemes using synthesized and real mixtures, which shows the
potential of SoundBeam.Comment: Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processin
How does end-to-end speech recognition training impact speech enhancement artifacts?
Jointly training a speech enhancement (SE) front-end and an automatic speech
recognition (ASR) back-end has been investigated as a way to mitigate the
influence of \emph{processing distortion} generated by single-channel SE on
ASR. In this paper, we investigate the effect of such joint training on the
signal-level characteristics of the enhanced signals from the viewpoint of the
decomposed noise and artifact errors. The experimental analyses provide two
novel findings: 1) ASR-level training of the SE front-end reduces the artifact
errors while increasing the noise errors, and 2) simply interpolating the
enhanced and observed signals, which achieves a similar effect of reducing
artifacts and increasing noise, improves ASR performance without jointly
modifying the SE and ASR modules, even for a strong ASR back-end using a WavLM
feature extractor. Our findings provide a better understanding of the effect of
joint training and a novel insight for designing an ASR agnostic SE front-end.Comment: 5 pages, 1 figure, 1 tabl
Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization
Sound event localization frameworks based on deep neural networks have shown
increased robustness with respect to reverberation and noise in comparison to
classical parametric approaches. In particular, recurrent architectures that
incorporate temporal context into the estimation process seem to be well-suited
for this task. This paper proposes a novel approach to sound event localization
by utilizing an attention-based sequence-to-sequence model. These types of
models have been successfully applied to problems in natural language
processing and automatic speech recognition. In this work, a multi-channel
audio signal is encoded to a latent representation, which is subsequently
decoded to a sequence of estimated directions-of-arrival. Herein, attentions
allow for capturing temporal dependencies in the audio signal by focusing on
specific frames that are relevant for estimating the activity and
direction-of-arrival of sound events at the current time-step. The framework is
evaluated on three publicly available datasets for sound event localization. It
yields superior localization performance compared to state-of-the-art methods
in both anechoic and reverberant conditions.Comment: Published in Proceedings of the 28th European Signal Processing
Conference (EUSIPCO), 202
Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss
Array processing performance depends on the number of microphones available.
Virtual microphone estimation (VME) has been proposed to increase the number of
microphone signals artificially. Neural network-based VME (NN-VME) trains an NN
with a VM-level loss to predict a signal at a microphone location that is
available during training but not at inference. However, this training
objective may not be optimal for a specific array processing back-end, such as
beamforming. An alternative approach is to use a training objective considering
the array-processing back-end, such as a loss on the beamformer output. This
approach may generate signals optimal for beamforming but not physically
grounded. To combine the advantages of both approaches, this paper proposes a
multi-task loss for NN-VME that combines both VM-level and beamformer-level
losses. We evaluate the proposed multi-task NN-VME on multi-talker
underdetermined conditions and show that it achieves a 33.1 % relative WER
improvement compared to using only real microphones and 10.8 % compared to
using a prior NN-VME approach.Comment: 5 pages, 2 figures, 1 tabl
Efficacy and Complications of Emergent Transcatheter Arterial Embolization for the Management of Intractable Uterine Bleeding
Objective:Transcatheter arterial embolization(TAE), including uterine artery embolization(UAE), is effective for the management of obstetric and gynecologic hemorrhage. Some adverse effects have been reported with TAE, such as amenorrhea, endometrial trauma, and subsequent infertility. Herein we report the efficacy and complications of emergent TAE for the management of severe intractable uterine bleeding at our institute.Methods:From 2010 to 2019, thirty-eight patients underwent emergent TAE for intractable uterine bleeding. We evaluated the efficacy and complications of TAE, including a change in menstruation, fertility, and pregnancy outcomes in perinatal patients(group A;n=23), and in patients with gynecologic hemorrhage(group B;n=15).Results:In group A, 7 cases of retained placenta, 4 cases of postpartum hemorrhage, 2 cases of placenta accrete, 2 cases of uterine artery pseudoaneurysm, 2 cases of cervical pregnancy, 1 case of cesarean scar pregnancy, and 5 cases of unexplained hemorrhage were included. The median age of the group A was 37. In group B, 4 cases of uterine artery pseudoaneurysm, 2 cases of uterine arteriovenous malformation, 3 cases of uterine fibroids, 1case of adenomyosis, and 5 cases of unexplained hemorrhage were included. The median age of the group B was 39. The first attempt at TAE successfully controlled hemorrhage in 33 of 38 patients (86.8%)without major complications, and the remaining 5 patients required an additional attempt at TAE to control hemorrhage. One patient(2.6%)had transient buttock pain and foot ischemia. Among the 33 patients who had adequate follow-up care, all patients resumed regular menstruation. The median time to resume regular menstruation after TAE was 3 months (range, 1-13 months)in group A(n=20)and 1 month(range, 1-6 months)in group B(n=13). Four of patients had 6 pregnancies in total:3 full-term live births, 2 missed abortions, and 1 artificial abortion. Among the 13 patients who desired pregnancy, 3(23%)conceived spontaneously.Conclusions:This retrospective study showed that emergent TAE may be effective and safe in treating intractable uterine bleeding with a high success rate. Ovarian and endometrial function were preserved based on the relatively early return of menstruation. Further prospective investigations with large number of patients are needed to confirm the preservation of ovarian function, fertility, and pregnancy outcome in reproductive-aged women
- …