4,349 research outputs found
Robust Speaker Recognition Using Speech Enhancement And Attention Model
In this paper, a novel architecture for speaker recognition is proposed by
cascading speech enhancement and speaker processing. Its aim is to improve
speaker recognition performance when speech signals are corrupted by noise.
Instead of individually processing speech enhancement and speaker recognition,
the two modules are integrated into one framework by a joint optimisation using
deep neural networks. Furthermore, to increase robustness against noise, a
multi-stage attention mechanism is employed to highlight the speaker related
features learned from context information in time and frequency domain. To
evaluate speaker identification and verification performance of the proposed
approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark
datasets. Moreover, the robustness of our proposed approach is also tested on
VoxCeleb1 data when being corrupted by three types of interferences, general
noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The
obtained results show that the proposed approach using speech enhancement and
multi-stage attention models outperforms two strong baselines not using them in
most acoustic conditions in our experiments.Comment: Acceptted by Odyssey 202
A Hybrid Differential Evolution Approach to Designing Deep Convolutional Neural Networks for Image Classification
Convolutional Neural Networks (CNNs) have demonstrated their superiority in
image classification, and evolutionary computation (EC) methods have recently
been surging to automatically design the architectures of CNNs to save the
tedious work of manually designing CNNs. In this paper, a new hybrid
differential evolution (DE) algorithm with a newly added crossover operator is
proposed to evolve the architectures of CNNs of any lengths, which is named
DECNN. There are three new ideas in the proposed DECNN method. Firstly, an
existing effective encoding scheme is refined to cater for variable-length CNN
architectures; Secondly, the new mutation and crossover operators are developed
for variable-length DE to optimise the hyperparameters of CNNs; Finally, the
new second crossover is introduced to evolve the depth of the CNN
architectures. The proposed algorithm is tested on six widely-used benchmark
datasets and the results are compared to 12 state-of-the-art methods, which
shows the proposed method is vigorously competitive to the state-of-the-art
algorithms. Furthermore, the proposed method is also compared with a method
using particle swarm optimisation with a similar encoding strategy named IPPSO,
and the proposed DECNN outperforms IPPSO in terms of the accuracy.Comment: Accepted by The Australasian Joint Conference on Artificial
Intelligence 201
Semantic Tagging with Deep Residual Networks
We propose a novel semantic tagging task, sem-tagging, tailored for the
purpose of multilingual semantic parsing, and present the first tagger using
deep residual networks (ResNets). Our tagger uses both word and character
representations and includes a novel residual bypass architecture. We evaluate
the tagset both intrinsically on the new task of semantic tagging, as well as
on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an
auxiliary loss function predicting our semantic tags, significantly outperforms
prior results on English Universal Dependencies POS tagging (95.71% accuracy on
UD v1.2 and 95.67% accuracy on UD v1.3).Comment: COLING 2016, camera ready versio
Recommended from our members
Monaural speech separation with deep learning using phase modelling and capsule networks
The removal of background noise from speech audio is a problem with high practical relevance. A variety of deep learning approaches have been applied to it in recent years, most of which operate on a magnitude spectrogram representation of a noisy recording to estimate the isolated speaking voice. This work investigates ways to include phase information, which is commonly discarded, firstly within a convolutional neural network (CNN) architecture, and secondly by applying capsule networks, to our knowledge the first time capsules have been used in source separation. We present a Circular Loss function, which takes into account the periodic nature of phase. Our results show that the inclusion of phase information leads to an improvement in the quality of speech separation. We also find that in our experiments convolutional neural networks outperform capsule networks at speech separation
- …