28 research outputs found

    Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

    Full text link
    We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains an information highway to flow an over-complete input representation through multiple FSB-LSTM modules. Each FSB-LSTM module consists of a full-band block to model spectro-temporal patterns at all frequencies and a sub-band block to model patterns within each sub-band, where each of the two blocks takes a down-sampled representation as input and returns an up-sampled discriminative representation to be added to the block input via a residual connection. The model is designed to have a low algorithmic complexity, a small run-time buffer and a very low algorithmic latency, at the same time producing a strong enhancement performance on a noisy-reverberant speech enhancement task even if the hop size is as low as 22 ms.Comment: in ICASSP 202

    TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

    Full text link
    We propose TF-GridNet for speech separation. The model is a novel multi-path deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several multi-path blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI) components of input signals are stacked as features to predict target RI components. We first evaluate it on monaural anechoic speaker separation. Without using data augmentation and dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a standard dataset for two-speaker separation. To show its robustness to noise and reverberation, we evaluate it on monaural reverberant speaker separation using the SMS-WSJ dataset and on noisy-reverberant speaker separation using WHAMR!, and obtain state-of-the-art performance on both datasets. We then extend TF-GridNet to multi-microphone conditions through multi-microphone complex spectral mapping, and integrate it into a two-DNN system with a beamformer in between (named as MISO-BF-MISO in earlier studies), where the beamformer proposed in this paper is a novel multi-frame Wiener filter computed based on the outputs of the first DNN. State-of-the-art performance is obtained on the multi-channel tasks of SMS-WSJ and WHAMR!. Besides speaker separation, we apply the proposed algorithms to speech dereverberation and noisy-reverberant speech enhancement. State-of-the-art performance is obtained on a dereverberation dataset and on the dataset of the recent L3DAS22 multi-channel speech enhancement challenge.Comment: In submission. A sound demo is available at https://zqwang7.github.io/demos/TF-GridNet-demo/index.htm

    Joint unsupervised and supervised learning for context-aware language identification

    Full text link
    Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. The proposed method learns the context of speech through masked language modeling (MLM) loss and simultaneously trains to determine the language of the utterance with supervised learning loss. The proposed joint learning was found to reduce the error rate by 15.6% compared to the same structure model trained by supervised-only learning on a subset of the VoxLingua107 dataset consisting of sub-three-second utterances in 11 languages.Comment: Accepted by ICASSP 202

    Influence of Warm Isostatic Press Process on Mechanical Properties of a Part Fabricated by Metal Material Extrusion Process

    No full text
    Material extrusion (ME) using a filament including metal powders has recently attracted considerable attention because it allows the production of metal parts at low cost. However, like other additive manufacturing processes, metal ME suffers from the problem of internal pores. In this study, warm isostatic pressure (WIP)—a post-process used to downsize or remove the pores in polymer ME—was employed in metal ME to improve the mechanical properties of the finished part. It was confirmed experimentally that the tensile strength and the strain at the ultimate tensile strength were increased by WIP. However, from hardness tests, two different results were obtained. On a microscopic scale, there was no change in hardness because the temperature of the WIP process was not high enough to change the microstructure, while on a macroscopic scale, the hardness changed owing to the collapse of the pores within the material under the indenter load. In specimens with relatively large pores, the hardness sensitivity increases with a larger indenter. Finally, factors affecting the WIP process parameters in metal ME were discussed

    Influence of Warm Isostatic Press Process on Mechanical Properties of a Part Fabricated by Metal Material Extrusion Process

    No full text
    Material extrusion (ME) using a filament including metal powders has recently attracted considerable attention because it allows the production of metal parts at low cost. However, like other additive manufacturing processes, metal ME suffers from the problem of internal pores. In this study, warm isostatic pressure (WIP)—a post-process used to downsize or remove the pores in polymer ME—was employed in metal ME to improve the mechanical properties of the finished part. It was confirmed experimentally that the tensile strength and the strain at the ultimate tensile strength were increased by WIP. However, from hardness tests, two different results were obtained. On a microscopic scale, there was no change in hardness because the temperature of the WIP process was not high enough to change the microstructure, while on a macroscopic scale, the hardness changed owing to the collapse of the pores within the material under the indenter load. In specimens with relatively large pores, the hardness sensitivity increases with a larger indenter. Finally, factors affecting the WIP process parameters in metal ME were discussed
    corecore