Search CORE

7 research outputs found

Recommended from our members

Speaker recognition with hybrid features from a deep belief network

Author: AR Mohamed
Artur S. d’Avila Garcez
C Burges
Emmanouil Benetos
F Richardson
GE Hinton
GE Hinton
H Ali
H Ali
H Lee
Hazrat Ali
L Deng
N Dehak
N Roux Le
Son N. Tran
T Kinnunen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/08/2016
Field of study

Learning representation from audio data has shown advantages over the handcrafted features such as mel-frequency cepstral coefficients (MFCCs) in many audio applications. In most of the representation learning approaches, the connectionist systems have been used to learn and extract latent features from the fixed length data. In this paper, we propose an approach to combine the learned features and the MFCC features for speaker recognition task, which can be applied to audio scripts of different lengths. In particular, we study the use of features from different levels of deep belief network for quantizing the audio data into vectors of audio word counts. These vectors represent the audio scripts of different lengths that make them easier to train a classifier. We show in the experiment that the audio word count vectors generated from mixture of DBN features at different layers give better performance than the MFCC features. We also can achieve further improvement by combining the audio word count vector and the MFCC features

City Research Online

Crossref

University of Tasmania Open Access Repository

Queen Mary Research Online

Supervised classification for object identification in urban areas using satellite imagery

Author: Ali Hazrat
Awan Adnan Ali
Khan Sanaullah
Khan Shahid
Rahman Atiq ur
Shafique Omer
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/08/2018
Field of study

This paper presents a useful method to achieve classification in satellite imagery. The approach is based on pixel level study employing various features such as correlation, homogeneity, energy and contrast. In this study gray-scale images are used for training the classification model. For supervised classification, two classification techniques are employed namely the Support Vector Machine (SVM) and the Naive Bayes. With textural features used for gray-scale images, Naive Bayes performs better with an overall accuracy of 76% compared to 68% achieved by SVM. The computational time is evaluated while performing the experiment with two different window sizes i.e., 50x50 and 70x70. The required computational time on a single image is found to be 27 seconds for a window size of 70x70 and 45 seconds for a window size of 50x50.Comment: 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET

arXiv.org e-Print Archive

Crossref

Deep learning methods in speaker recognition: a review

Author: Beke András
Szaszák György
Sztahó Dávid
Publication venue
Publication date: 14/11/2019
Field of study

This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective

arXiv.org e-Print Archive