2,954 research outputs found
Speaker Re-identification with Speaker Dependent Speech Enhancement
While the use of deep neural networks has significantly boosted speaker
recognition performance, it is still challenging to separate speakers in poor
acoustic environments. Here speech enhancement methods have traditionally
allowed improved performance. The recent works have shown that adapting speech
enhancement can lead to further gains. This paper introduces a novel approach
that cascades speech enhancement and speaker recognition. In the first step, a
speaker embedding vector is generated , which is used in the second step to
enhance the speech quality and re-identify the speakers. Models are trained in
an integrated framework with joint optimisation. The proposed approach is
evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition
in real world situations. In addition three types of noise at different
signal-noise-ratios were added for this work. The obtained results show that
the proposed approach using speaker dependent speech enhancement can yield
better speaker recognition and speech enhancement performances than two
baselines in various noise conditions.Comment: Acceptted for presentation at Interspeech202
Robust Speaker Recognition Using Speech Enhancement And Attention Model
In this paper, a novel architecture for speaker recognition is proposed by
cascading speech enhancement and speaker processing. Its aim is to improve
speaker recognition performance when speech signals are corrupted by noise.
Instead of individually processing speech enhancement and speaker recognition,
the two modules are integrated into one framework by a joint optimisation using
deep neural networks. Furthermore, to increase robustness against noise, a
multi-stage attention mechanism is employed to highlight the speaker related
features learned from context information in time and frequency domain. To
evaluate speaker identification and verification performance of the proposed
approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark
datasets. Moreover, the robustness of our proposed approach is also tested on
VoxCeleb1 data when being corrupted by three types of interferences, general
noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The
obtained results show that the proposed approach using speech enhancement and
multi-stage attention models outperforms two strong baselines not using them in
most acoustic conditions in our experiments.Comment: Acceptted by Odyssey 202
Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification
Identifying multiple speakers without knowing where a speaker's voice is in a
recording is a challenging task. In this paper, a hierarchical attention
network is proposed to solve a weakly labelled speaker identification problem.
The use of a hierarchical structure, consisting of a frame-level encoder and a
segment-level encoder, aims to learn speaker related information locally and
globally. Speech streams are segmented into fragments. The frame-level encoder
with attention learns features and highlights the target related frames
locally, and output a fragment based embedding. The segment-level encoder works
with a second attention layer to emphasize the fragments probably related to
target speakers. The global information is finally collected from segment-level
module to predict speakers via a classifier. To evaluate the effectiveness of
the proposed approach, artificial datasets based on Switchboard Cellular part1
(SWBC) and Voxceleb1 are constructed in two conditions, where speakers' voices
are overlapped and not overlapped. Comparing to two baselines the obtained
results show that the proposed approach can achieve better performances.
Moreover, further experiments are conducted to evaluate the impact of utterance
segmentation. The results show that a reasonable segmentation can slightly
improve identification performances.Comment: Acceptted for presentation at Interspeech202
H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model
In this paper, a hierarchical attention network to generate utterance-level
embeddings (H-vectors) for speaker identification is proposed. Since different
parts of an utterance may have different contributions to speaker identities,
the use of hierarchical structure aims to learn speaker related information
locally and globally. In the proposed approach, frame-level encoder and
attention are applied on segments of an input utterance and generate individual
segment vectors. Then, segment level attention is applied on the segment
vectors to construct an utterance representation. To evaluate the effectiveness
of the proposed approach, NIST SRE 2008 Part1 dataset is used for training, and
two datasets, Switchboard Cellular part1 and CallHome American English Speech,
are used to evaluate the quality of extracted utterance embeddings on speaker
identification and verification tasks. In comparison with two baselines,
X-vector, X-vector+Attention, the obtained results show that H-vectors can
achieve a significantly better performance. Furthermore, the extracted
utterance-level embeddings are more discriminative than the two baselines when
mapped into a 2D space using t-SNE
Quality Risk Evaluation of the Food Supply Chain Using a Fuzzy Comprehensive Evaluation Model and Failure Mode, Effects, and Criticality Analysis
Evaluating the quality risk level in the food supply chain can reduce quality information asymmetry and food quality incidents and promote nationally integrated regulations for food quality. In order to evaluate it, a quality risk evaluation indicator system for the food supply chain is constructed based on an extensive literature review in this paper. Furthermore, a mathematical model based on the fuzzy comprehensive evaluation model (FCEM) and failure mode, effects, and criticality analysis (FMECA) for evaluating the quality risk level in the food supply chain is developed. A computational experiment aimed at verifying the effectiveness and feasibility of this proposed model is conducted on the basis of a questionnaire survey. The results suggest that this model can be used as a general guideline to assess the quality risk level in the food supply chain and achieve the most important objective of providing a reference for the public and private sectors when making decisions on food quality management
- …