64 research outputs found

    SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

    Full text link
    In many situations, we would like to hear desired sound events (SEs) while being able to ignore interference. Target sound extraction (TSE) aims at tackling this problem by estimating the sound of target SE classes in a mixture while suppressing all other sounds. We can achieve this with a neural network that extracts the target SEs by conditioning it on clues representing the target SE classes. Two types of clues have been proposed, i.e., target SE class labels and enrollment sound samples similar to the target sound. Systems based on SE class labels can directly optimize embedding vectors representing the SE classes, resulting in high extraction performance. However, extending these systems to the extraction of new SE classes not encountered during training is not easy. Enrollment-based approaches extract SEs by finding sounds in the mixtures that share similar characteristics to the enrollment. These approaches do not explicitly rely on SE class definitions and can thus handle new SE classes. In this paper, we introduce a TSE framework, SoundBeam, that combines the advantages of both approaches. We also perform an extensive evaluation of the different TSE schemes using synthesized and real mixtures, which shows the potential of SoundBeam.Comment: Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processin

    How does end-to-end speech recognition training impact speech enhancement artifacts?

    Full text link
    Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR. In this paper, we investigate the effect of such joint training on the signal-level characteristics of the enhanced signals from the viewpoint of the decomposed noise and artifact errors. The experimental analyses provide two novel findings: 1) ASR-level training of the SE front-end reduces the artifact errors while increasing the noise errors, and 2) simply interpolating the enhanced and observed signals, which achieves a similar effect of reducing artifacts and increasing noise, improves ASR performance without jointly modifying the SE and ASR modules, even for a strong ASR back-end using a WavLM feature extractor. Our findings provide a better understanding of the effect of joint training and a novel insight for designing an ASR agnostic SE front-end.Comment: 5 pages, 1 figure, 1 tabl

    Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization

    Full text link
    Sound event localization frameworks based on deep neural networks have shown increased robustness with respect to reverberation and noise in comparison to classical parametric approaches. In particular, recurrent architectures that incorporate temporal context into the estimation process seem to be well-suited for this task. This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model. These types of models have been successfully applied to problems in natural language processing and automatic speech recognition. In this work, a multi-channel audio signal is encoded to a latent representation, which is subsequently decoded to a sequence of estimated directions-of-arrival. Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the activity and direction-of-arrival of sound events at the current time-step. The framework is evaluated on three publicly available datasets for sound event localization. It yields superior localization performance compared to state-of-the-art methods in both anechoic and reverberant conditions.Comment: Published in Proceedings of the 28th European Signal Processing Conference (EUSIPCO), 202

    Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss

    Full text link
    Array processing performance depends on the number of microphones available. Virtual microphone estimation (VME) has been proposed to increase the number of microphone signals artificially. Neural network-based VME (NN-VME) trains an NN with a VM-level loss to predict a signal at a microphone location that is available during training but not at inference. However, this training objective may not be optimal for a specific array processing back-end, such as beamforming. An alternative approach is to use a training objective considering the array-processing back-end, such as a loss on the beamformer output. This approach may generate signals optimal for beamforming but not physically grounded. To combine the advantages of both approaches, this paper proposes a multi-task loss for NN-VME that combines both VM-level and beamformer-level losses. We evaluate the proposed multi-task NN-VME on multi-talker underdetermined conditions and show that it achieves a 33.1 % relative WER improvement compared to using only real microphones and 10.8 % compared to using a prior NN-VME approach.Comment: 5 pages, 2 figures, 1 tabl

    Efficacy and Complications of Emergent Transcatheter Arterial Embolization for the Management of Intractable Uterine Bleeding

    Get PDF
    Objective:Transcatheter arterial embolization(TAE), including uterine artery embolization(UAE), is effective for the management of obstetric and gynecologic hemorrhage. Some adverse effects have been reported with TAE, such as amenorrhea, endometrial trauma, and subsequent infertility. Herein we report the efficacy and complications of emergent TAE for the management of severe intractable uterine bleeding at our institute.Methods:From 2010 to 2019, thirty-eight patients underwent emergent TAE for intractable uterine bleeding. We evaluated the efficacy and complications of TAE, including a change in menstruation, fertility, and pregnancy outcomes in perinatal patients(group A;n=23), and in patients with gynecologic hemorrhage(group B;n=15).Results:In group A, 7 cases of retained placenta, 4 cases of postpartum hemorrhage, 2 cases of placenta accrete, 2 cases of uterine artery pseudoaneurysm, 2 cases of cervical pregnancy, 1 case of cesarean scar pregnancy, and 5 cases of unexplained hemorrhage were included. The median age of the group A was 37. In group B, 4 cases of uterine artery pseudoaneurysm, 2 cases of uterine arteriovenous malformation, 3 cases of uterine fibroids, 1case of adenomyosis, and 5 cases of unexplained hemorrhage were included. The median age of the group B was 39. The first attempt at TAE successfully controlled hemorrhage in 33 of 38 patients (86.8%)without major complications, and the remaining 5 patients required an additional attempt at TAE to control hemorrhage. One patient(2.6%)had transient buttock pain and foot ischemia. Among the 33 patients who had adequate follow-up care, all patients resumed regular menstruation. The median time to resume regular menstruation after TAE was 3 months (range, 1-13 months)in group A(n=20)and 1 month(range, 1-6 months)in group B(n=13). Four of patients had 6 pregnancies in total:3 full-term live births, 2 missed abortions, and 1 artificial abortion. Among the 13 patients who desired pregnancy, 3(23%)conceived spontaneously.Conclusions:This retrospective study showed that emergent TAE may be effective and safe in treating intractable uterine bleeding with a high success rate. Ovarian and endometrial function were preserved based on the relatively early return of menstruation. Further prospective investigations with large number of patients are needed to confirm the preservation of ovarian function, fertility, and pregnancy outcome in reproductive-aged women
    corecore