73 research outputs found

    Robust Speaker Recognition Using Speech Enhancement And Attention Model

    Get PDF
    In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks. Furthermore, to increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain. To evaluate speaker identification and verification performance of the proposed approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark datasets. Moreover, the robustness of our proposed approach is also tested on VoxCeleb1 data when being corrupted by three types of interferences, general noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.Comment: Acceptted by Odyssey 202

    Speaker Re-identification with Speaker Dependent Speech Enhancement

    Get PDF
    While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting speech enhancement can lead to further gains. This paper introduces a novel approach that cascades speech enhancement and speaker recognition. In the first step, a speaker embedding vector is generated , which is used in the second step to enhance the speech quality and re-identify the speakers. Models are trained in an integrated framework with joint optimisation. The proposed approach is evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition in real world situations. In addition three types of noise at different signal-noise-ratios were added for this work. The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.Comment: Acceptted for presentation at Interspeech202

    Joint-tree model and the maximum genus of graphs

    Get PDF
    The vertex v of a graph G is called a 1-critical-vertex for the maximum genus of the graph, or for simplicity called 1-critical-vertex, if G-v is a connected graph and {\deg}M(G - v) = {\deg}M(G) - 1. In this paper, through the joint-tree model, we obtained some types of 1-critical-vertex, and get the upper embeddability of the Spiral Snm

    The maximum genus of graphs with diameter three

    Get PDF
    AbstractThis paper shows that if G is a simple graph with diameter three then G is up-embeddable unless G is either a Δ2-graph (Fig. 1) or a Δ3-graph (Fig. 2) with ξ(G) = 2, i.e., the maximum genus γM(G) = (β(G) − 2)/2

    Vertex Splitting and Upper Embeddable Graphs

    Full text link
    The weak minor G of a graph G is the graph obtained from G by a sequence of edge-contraction operations on G. A weak-minor-closed family of upper embeddable graphs is a set G of upper embeddable graphs that for each graph G in G, every weak minor of G is also in G. Up to now, there are few results providing the necessary and sufficient conditions for characterizing upper embeddability of graphs. In this paper, we studied the relation between the vertex splitting operation and the upper embeddability of graphs; provided not only a necessary and sufficient condition for characterizing upper embeddability of graphs, but also a way to construct weak-minor-closed family of upper embeddable graphs from the bouquet of circles; extended a result in J: Graph Theory obtained by L. Nebesk{\P}y. In addition, the algorithm complex of determining the upper embeddability of a graph can be reduced much by the results obtained in this paper

    Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

    Full text link
    Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. In this paper, a hierarchical attention network is proposed to solve a weakly labelled speaker identification problem. The use of a hierarchical structure, consisting of a frame-level encoder and a segment-level encoder, aims to learn speaker related information locally and globally. Speech streams are segmented into fragments. The frame-level encoder with attention learns features and highlights the target related frames locally, and output a fragment based embedding. The segment-level encoder works with a second attention layer to emphasize the fragments probably related to target speakers. The global information is finally collected from segment-level module to predict speakers via a classifier. To evaluate the effectiveness of the proposed approach, artificial datasets based on Switchboard Cellular part1 (SWBC) and Voxceleb1 are constructed in two conditions, where speakers' voices are overlapped and not overlapped. Comparing to two baselines the obtained results show that the proposed approach can achieve better performances. Moreover, further experiments are conducted to evaluate the impact of utterance segmentation. The results show that a reasonable segmentation can slightly improve identification performances.Comment: Acceptted for presentation at Interspeech202

    H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

    Get PDF
    In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use of hierarchical structure aims to learn speaker related information locally and globally. In the proposed approach, frame-level encoder and attention are applied on segments of an input utterance and generate individual segment vectors. Then, segment level attention is applied on the segment vectors to construct an utterance representation. To evaluate the effectiveness of the proposed approach, NIST SRE 2008 Part1 dataset is used for training, and two datasets, Switchboard Cellular part1 and CallHome American English Speech, are used to evaluate the quality of extracted utterance embeddings on speaker identification and verification tasks. In comparison with two baselines, X-vector, X-vector+Attention, the obtained results show that H-vectors can achieve a significantly better performance. Furthermore, the extracted utterance-level embeddings are more discriminative than the two baselines when mapped into a 2D space using t-SNE

    Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery using Voice Recognition

    Full text link
    Human-robot collaboration has benefited users with higher efficiency towards interactive tasks. Nevertheless, most collaborative schemes rely on complicated human-machine interfaces, which might lack the requisite intuitiveness compared with natural limb control. We also expect to understand human intent with low training data requirements. In response to these challenges, this paper introduces an innovative human-robot collaborative framework that seamlessly integrates hand gesture and dynamic movement recognition, voice recognition, and a switchable control adaptation strategy. These modules provide a user-friendly approach that enables the robot to deliver the tools as per user need, especially when the user is working with both hands. Therefore, users can focus on their task execution without additional training in the use of human-machine interfaces, while the robot interprets their intuitive gestures. The proposed multimodal interaction framework is executed in the UR5e robot platform equipped with a RealSense D435i camera, and the effectiveness is assessed through a soldering circuit board task. The experiment results have demonstrated superior performance in hand gesture recognition, where the static hand gesture recognition module achieves an accuracy of 94.3\%, while the dynamic motion recognition module reaches 97.6\% accuracy. Compared with human solo manipulation, the proposed approach facilitates higher efficiency tool delivery, without significantly distracting from human intents.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Mariculture may intensify eutrophication but lower N/P ratios: a case study based on nutrients and dual nitrate isotope measurements in Sansha Bay, southeastern China

    Get PDF
    The mariculture industry has grown rapidly worldwide over the past few decades. The industry helps meet growing food demands and may provide an effective means of carbon sequestration; however, it may harm the marine ecological environment, and the extent of its impact depends on the type of mariculture. Here we focus on the impact of mariculture on the nutrient status and eutrophication in Sansha Bay, which is a typical aquaculture harbor in southeastern China that employs a combination of shellfish and seaweed farming. Nutrient concentrations and dual nitrate isotopes were measured in Sansha Bay during the winter of 2021. The average concentrations of nitrate and phosphate were 31.3 ± 10.5 and 2.26 ± 0.84 µM, respectively, indicating that the water was in a eutrophic state. However, the N/P ratios were relatively low (14.3 ± 2.2). Nitrate isotope measurements were 8.8‰–11.9‰ for δ15N-NO3− and 2.2‰–6.0‰ for δ18O-NO3−. Source analysis based on the nitrate isotope measurements indicates that nitrate in Sansha Bay is derived mainly from the excretion of organisms and sewage discharge from mariculture. The isotopic fractionation model of nitrate assimilation by organisms indicates that surface waters in Sansha Bay experience strong biological uptake of nitrate, which is likely related to seaweed farming in winter. The low N/P ratios may be attributed to excessive nitrogen uptake (relative to phosphorus) during shellfish and seaweed farming, as well as nitrogen removal through sediment denitrification, which is fueled by the sinking of particulate organic matter from mariculture. Overall, our study shows that mariculture activities dominated by shellfish and seaweed cultivation in Sansha Bay may exacerbate eutrophication but reduce N/P ratios in the water column in aquaculture areas
    • …
    corecore