28 research outputs found
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture
that integrates full- and sub-band (FSB) modeling, for single- and
multi-channel speech enhancement in the short-time Fourier transform (STFT)
domain. The model maintains an information highway to flow an over-complete
input representation through multiple FSB-LSTM modules. Each FSB-LSTM module
consists of a full-band block to model spectro-temporal patterns at all
frequencies and a sub-band block to model patterns within each sub-band, where
each of the two blocks takes a down-sampled representation as input and returns
an up-sampled discriminative representation to be added to the block input via
a residual connection. The model is designed to have a low algorithmic
complexity, a small run-time buffer and a very low algorithmic latency, at the
same time producing a strong enhancement performance on a noisy-reverberant
speech enhancement task even if the hop size is as low as ms.Comment: in ICASSP 202
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
We propose TF-GridNet for speech separation. The model is a novel multi-path
deep neural network (DNN) integrating full- and sub-band modeling in the
time-frequency (T-F) domain. It stacks several multi-path blocks, each
consisting of an intra-frame full-band module, a sub-band temporal module, and
a cross-frame self-attention module. It is trained to perform complex spectral
mapping, where the real and imaginary (RI) components of input signals are
stacked as features to predict target RI components. We first evaluate it on
monaural anechoic speaker separation. Without using data augmentation and
dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in
scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a standard
dataset for two-speaker separation. To show its robustness to noise and
reverberation, we evaluate it on monaural reverberant speaker separation using
the SMS-WSJ dataset and on noisy-reverberant speaker separation using WHAMR!,
and obtain state-of-the-art performance on both datasets. We then extend
TF-GridNet to multi-microphone conditions through multi-microphone complex
spectral mapping, and integrate it into a two-DNN system with a beamformer in
between (named as MISO-BF-MISO in earlier studies), where the beamformer
proposed in this paper is a novel multi-frame Wiener filter computed based on
the outputs of the first DNN. State-of-the-art performance is obtained on the
multi-channel tasks of SMS-WSJ and WHAMR!. Besides speaker separation, we apply
the proposed algorithms to speech dereverberation and noisy-reverberant speech
enhancement. State-of-the-art performance is obtained on a dereverberation
dataset and on the dataset of the recent L3DAS22 multi-channel speech
enhancement challenge.Comment: In submission. A sound demo is available at
https://zqwang7.github.io/demos/TF-GridNet-demo/index.htm
Joint unsupervised and supervised learning for context-aware language identification
Language identification (LID) recognizes the language of a spoken utterance
automatically. According to recent studies, LID models trained with an
automatic speech recognition (ASR) task perform better than those trained with
a LID task only. However, we need additional text labels to train the model to
recognize speech, and acquiring the text labels is a cost high. In order to
overcome this problem, we propose context-aware language identification using a
combination of unsupervised and supervised learning without any text labels.
The proposed method learns the context of speech through masked language
modeling (MLM) loss and simultaneously trains to determine the language of the
utterance with supervised learning loss. The proposed joint learning was found
to reduce the error rate by 15.6% compared to the same structure model trained
by supervised-only learning on a subset of the VoxLingua107 dataset consisting
of sub-three-second utterances in 11 languages.Comment: Accepted by ICASSP 202
Efficacy of PEEK Cages and Plate Augmentation in Three-Level Anterior Cervical Fusion of Elderly Patients
Influence of Warm Isostatic Press Process on Mechanical Properties of a Part Fabricated by Metal Material Extrusion Process
Material extrusion (ME) using a filament including metal powders has recently attracted considerable attention because it allows the production of metal parts at low cost. However, like other additive manufacturing processes, metal ME suffers from the problem of internal pores. In this study, warm isostatic pressure (WIP)—a post-process used to downsize or remove the pores in polymer ME—was employed in metal ME to improve the mechanical properties of the finished part. It was confirmed experimentally that the tensile strength and the strain at the ultimate tensile strength were increased by WIP. However, from hardness tests, two different results were obtained. On a microscopic scale, there was no change in hardness because the temperature of the WIP process was not high enough to change the microstructure, while on a macroscopic scale, the hardness changed owing to the collapse of the pores within the material under the indenter load. In specimens with relatively large pores, the hardness sensitivity increases with a larger indenter. Finally, factors affecting the WIP process parameters in metal ME were discussed
Influence of Warm Isostatic Press Process on Mechanical Properties of a Part Fabricated by Metal Material Extrusion Process
Material extrusion (ME) using a filament including metal powders has recently attracted considerable attention because it allows the production of metal parts at low cost. However, like other additive manufacturing processes, metal ME suffers from the problem of internal pores. In this study, warm isostatic pressure (WIP)—a post-process used to downsize or remove the pores in polymer ME—was employed in metal ME to improve the mechanical properties of the finished part. It was confirmed experimentally that the tensile strength and the strain at the ultimate tensile strength were increased by WIP. However, from hardness tests, two different results were obtained. On a microscopic scale, there was no change in hardness because the temperature of the WIP process was not high enough to change the microstructure, while on a macroscopic scale, the hardness changed owing to the collapse of the pores within the material under the indenter load. In specimens with relatively large pores, the hardness sensitivity increases with a larger indenter. Finally, factors affecting the WIP process parameters in metal ME were discussed