73 research outputs found
Robust Speaker Recognition Using Speech Enhancement And Attention Model
In this paper, a novel architecture for speaker recognition is proposed by
cascading speech enhancement and speaker processing. Its aim is to improve
speaker recognition performance when speech signals are corrupted by noise.
Instead of individually processing speech enhancement and speaker recognition,
the two modules are integrated into one framework by a joint optimisation using
deep neural networks. Furthermore, to increase robustness against noise, a
multi-stage attention mechanism is employed to highlight the speaker related
features learned from context information in time and frequency domain. To
evaluate speaker identification and verification performance of the proposed
approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark
datasets. Moreover, the robustness of our proposed approach is also tested on
VoxCeleb1 data when being corrupted by three types of interferences, general
noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The
obtained results show that the proposed approach using speech enhancement and
multi-stage attention models outperforms two strong baselines not using them in
most acoustic conditions in our experiments.Comment: Acceptted by Odyssey 202
Speaker Re-identification with Speaker Dependent Speech Enhancement
While the use of deep neural networks has significantly boosted speaker
recognition performance, it is still challenging to separate speakers in poor
acoustic environments. Here speech enhancement methods have traditionally
allowed improved performance. The recent works have shown that adapting speech
enhancement can lead to further gains. This paper introduces a novel approach
that cascades speech enhancement and speaker recognition. In the first step, a
speaker embedding vector is generated , which is used in the second step to
enhance the speech quality and re-identify the speakers. Models are trained in
an integrated framework with joint optimisation. The proposed approach is
evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition
in real world situations. In addition three types of noise at different
signal-noise-ratios were added for this work. The obtained results show that
the proposed approach using speaker dependent speech enhancement can yield
better speaker recognition and speech enhancement performances than two
baselines in various noise conditions.Comment: Acceptted for presentation at Interspeech202
Joint-tree model and the maximum genus of graphs
The vertex v of a graph G is called a 1-critical-vertex for the maximum genus
of the graph, or for simplicity called 1-critical-vertex, if G-v is a connected
graph and {\deg}M(G - v) = {\deg}M(G) - 1. In this paper, through the
joint-tree model, we obtained some types of 1-critical-vertex, and get the
upper embeddability of the Spiral Snm
The maximum genus of graphs with diameter three
AbstractThis paper shows that if G is a simple graph with diameter three then G is up-embeddable unless G is either a Δ2-graph (Fig. 1) or a Δ3-graph (Fig. 2) with ξ(G) = 2, i.e., the maximum genus γM(G) = (β(G) − 2)/2
Vertex Splitting and Upper Embeddable Graphs
The weak minor G of a graph G is the graph obtained from G by a sequence of
edge-contraction operations on G. A weak-minor-closed family of upper
embeddable graphs is a set G of upper embeddable graphs that for each graph G
in G, every weak minor of G is also in G. Up to now, there are few results
providing the necessary and sufficient conditions for characterizing upper
embeddability of graphs. In this paper, we studied the relation between the
vertex splitting operation and the upper embeddability of graphs; provided not
only a necessary and sufficient condition for characterizing upper
embeddability of graphs, but also a way to construct weak-minor-closed family
of upper embeddable graphs from the bouquet of circles; extended a result in J:
Graph Theory obtained by L. Nebesk{\P}y. In addition, the algorithm complex of
determining the upper embeddability of a graph can be reduced much by the
results obtained in this paper
Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification
Identifying multiple speakers without knowing where a speaker's voice is in a
recording is a challenging task. In this paper, a hierarchical attention
network is proposed to solve a weakly labelled speaker identification problem.
The use of a hierarchical structure, consisting of a frame-level encoder and a
segment-level encoder, aims to learn speaker related information locally and
globally. Speech streams are segmented into fragments. The frame-level encoder
with attention learns features and highlights the target related frames
locally, and output a fragment based embedding. The segment-level encoder works
with a second attention layer to emphasize the fragments probably related to
target speakers. The global information is finally collected from segment-level
module to predict speakers via a classifier. To evaluate the effectiveness of
the proposed approach, artificial datasets based on Switchboard Cellular part1
(SWBC) and Voxceleb1 are constructed in two conditions, where speakers' voices
are overlapped and not overlapped. Comparing to two baselines the obtained
results show that the proposed approach can achieve better performances.
Moreover, further experiments are conducted to evaluate the impact of utterance
segmentation. The results show that a reasonable segmentation can slightly
improve identification performances.Comment: Acceptted for presentation at Interspeech202
H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model
In this paper, a hierarchical attention network to generate utterance-level
embeddings (H-vectors) for speaker identification is proposed. Since different
parts of an utterance may have different contributions to speaker identities,
the use of hierarchical structure aims to learn speaker related information
locally and globally. In the proposed approach, frame-level encoder and
attention are applied on segments of an input utterance and generate individual
segment vectors. Then, segment level attention is applied on the segment
vectors to construct an utterance representation. To evaluate the effectiveness
of the proposed approach, NIST SRE 2008 Part1 dataset is used for training, and
two datasets, Switchboard Cellular part1 and CallHome American English Speech,
are used to evaluate the quality of extracted utterance embeddings on speaker
identification and verification tasks. In comparison with two baselines,
X-vector, X-vector+Attention, the obtained results show that H-vectors can
achieve a significantly better performance. Furthermore, the extracted
utterance-level embeddings are more discriminative than the two baselines when
mapped into a 2D space using t-SNE
Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery using Voice Recognition
Human-robot collaboration has benefited users with higher efficiency towards
interactive tasks. Nevertheless, most collaborative schemes rely on complicated
human-machine interfaces, which might lack the requisite intuitiveness compared
with natural limb control. We also expect to understand human intent with low
training data requirements. In response to these challenges, this paper
introduces an innovative human-robot collaborative framework that seamlessly
integrates hand gesture and dynamic movement recognition, voice recognition,
and a switchable control adaptation strategy. These modules provide a
user-friendly approach that enables the robot to deliver the tools as per user
need, especially when the user is working with both hands. Therefore, users can
focus on their task execution without additional training in the use of
human-machine interfaces, while the robot interprets their intuitive gestures.
The proposed multimodal interaction framework is executed in the UR5e robot
platform equipped with a RealSense D435i camera, and the effectiveness is
assessed through a soldering circuit board task. The experiment results have
demonstrated superior performance in hand gesture recognition, where the static
hand gesture recognition module achieves an accuracy of 94.3\%, while the
dynamic motion recognition module reaches 97.6\% accuracy. Compared with human
solo manipulation, the proposed approach facilitates higher efficiency tool
delivery, without significantly distracting from human intents.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Mariculture may intensify eutrophication but lower N/P ratios: a case study based on nutrients and dual nitrate isotope measurements in Sansha Bay, southeastern China
The mariculture industry has grown rapidly worldwide over the past few decades. The industry helps meet growing food demands and may provide an effective means of carbon sequestration; however, it may harm the marine ecological environment, and the extent of its impact depends on the type of mariculture. Here we focus on the impact of mariculture on the nutrient status and eutrophication in Sansha Bay, which is a typical aquaculture harbor in southeastern China that employs a combination of shellfish and seaweed farming. Nutrient concentrations and dual nitrate isotopes were measured in Sansha Bay during the winter of 2021. The average concentrations of nitrate and phosphate were 31.3 ± 10.5 and 2.26 ± 0.84 µM, respectively, indicating that the water was in a eutrophic state. However, the N/P ratios were relatively low (14.3 ± 2.2). Nitrate isotope measurements were 8.8‰–11.9‰ for δ15N-NO3− and 2.2‰–6.0‰ for δ18O-NO3−. Source analysis based on the nitrate isotope measurements indicates that nitrate in Sansha Bay is derived mainly from the excretion of organisms and sewage discharge from mariculture. The isotopic fractionation model of nitrate assimilation by organisms indicates that surface waters in Sansha Bay experience strong biological uptake of nitrate, which is likely related to seaweed farming in winter. The low N/P ratios may be attributed to excessive nitrogen uptake (relative to phosphorus) during shellfish and seaweed farming, as well as nitrogen removal through sediment denitrification, which is fueled by the sinking of particulate organic matter from mariculture. Overall, our study shows that mariculture activities dominated by shellfish and seaweed cultivation in Sansha Bay may exacerbate eutrophication but reduce N/P ratios in the water column in aquaculture areas
- …