4,349 research outputs found

    Robust Speaker Recognition Using Speech Enhancement And Attention Model

    Get PDF
    In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks. Furthermore, to increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain. To evaluate speaker identification and verification performance of the proposed approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark datasets. Moreover, the robustness of our proposed approach is also tested on VoxCeleb1 data when being corrupted by three types of interferences, general noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.Comment: Acceptted by Odyssey 202

    A Hybrid Differential Evolution Approach to Designing Deep Convolutional Neural Networks for Image Classification

    Full text link
    Convolutional Neural Networks (CNNs) have demonstrated their superiority in image classification, and evolutionary computation (EC) methods have recently been surging to automatically design the architectures of CNNs to save the tedious work of manually designing CNNs. In this paper, a new hybrid differential evolution (DE) algorithm with a newly added crossover operator is proposed to evolve the architectures of CNNs of any lengths, which is named DECNN. There are three new ideas in the proposed DECNN method. Firstly, an existing effective encoding scheme is refined to cater for variable-length CNN architectures; Secondly, the new mutation and crossover operators are developed for variable-length DE to optimise the hyperparameters of CNNs; Finally, the new second crossover is introduced to evolve the depth of the CNN architectures. The proposed algorithm is tested on six widely-used benchmark datasets and the results are compared to 12 state-of-the-art methods, which shows the proposed method is vigorously competitive to the state-of-the-art algorithms. Furthermore, the proposed method is also compared with a method using particle swarm optimisation with a similar encoding strategy named IPPSO, and the proposed DECNN outperforms IPPSO in terms of the accuracy.Comment: Accepted by The Australasian Joint Conference on Artificial Intelligence 201

    Semantic Tagging with Deep Residual Networks

    Get PDF
    We propose a novel semantic tagging task, sem-tagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets). Our tagger uses both word and character representations and includes a novel residual bypass architecture. We evaluate the tagset both intrinsically on the new task of semantic tagging, as well as on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an auxiliary loss function predicting our semantic tags, significantly outperforms prior results on English Universal Dependencies POS tagging (95.71% accuracy on UD v1.2 and 95.67% accuracy on UD v1.3).Comment: COLING 2016, camera ready versio
    • …
    corecore