Search CORE

1,039 research outputs found

Speaker recognition by means of restricted Boltzmann machine adaptation

Author: Ghahabi Esfahani Omid
Hernando Pericás Francisco Javier
Safari Pooyan
Publication venue: 'Servicio de Publicaciones de la Universidad Autonoma de Madrid'
Publication date: 01/01/2016
Field of study

Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

From features to speaker vectors by means of restricted Boltzmann machine adaptation

Author: Ghahabi Esfahani Omid
Hernando Pericás Francisco Javier
Safari Pooyan
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

Restricted Boltzmann Machines (RBMs) have shown success in different stages of speaker recognition systems. In this paper, we propose a novel framework to produce a vector-based representation for each speaker, which will be referred to as RBM-vector. This new approach maps the speaker spectral features to a single fixed-dimensional vector carrying speaker-specific information. In this work, a global model, referred to as Universal RBM (URBM), is trained taking advantage of RBM unsupervised learning capabilities. Then, this URBM is adapted to the data of each speaker in the development, enrolment and evaluation datasets. The network connection weights of the adapted RBMs are further concatenated and subject to a whitening with dimension reduction stage to build the speaker vectors. The evaluation is performed on the core test condition of the NIST SRE 2006 database, and it is shown that RBM-vectors achieve 15% relative improvement in terms of EER compared to i-vectors using cosine scoring. The score fusion with i-vector attains more than 24% relative improvement. The interest of this result for score fusion yields on the fact that both vectors are produced in an unsupervised fashion and can be used instead of i-vector/PLDA approach, when no data label is available. Results obtained for RBM-vector/PLDA framework is comparable with the ones from i-vector/PLDA. Their score fusion achieves 14% relative improvement compared to i-vector/PLDA.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Transfer Learning for Speech and Language Processing

Author: Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 19/11/2015
Field of study

Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

arXiv.org e-Print Archive

Crossref

A Review of Physical Human Activity Recognition Chain Using Sensors

Author: Alshorman Buthaynah
AlShorman Omar
Masadeh Mahmoud S
Publication venue: IAES Indonesia Section
Publication date: 23/09/2020
Field of study

In the era of Internet of Medical Things (IoMT), healthcare monitoring has gained a vital role nowadays. Moreover, improving lifestyle, encouraging healthy behaviours, and decreasing the chronic diseases are urgently required. However, tracking and monitoring critical cases/conditions of elderly and patients is a great challenge. Healthcare services for those people are crucial in order to achieve high safety consideration. Physical human activity recognition using wearable devices is used to monitor and recognize human activities for elderly and patient. The main aim of this review study is to highlight the human activity recognition chain, which includes, sensing technologies, preprocessing and segmentation, feature extractions methods, and classification techniques. Challenges and future trends are also highlighted.

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

Deep learning backend for single and multisession i-vector speaker recognition

Author: Ghahabi Esfahani Omid
Hernando Pericás Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The lack of labeled background data makes a big performance gap between cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring baseline techniques for i-vectors in speaker recognition. Although there are some unsupervised clustering techniques to estimate the labels, they cannot accurately predict the true labels and they also assume that there are several samples from the same speaker in the background data that could not be true in reality. In this paper, the authors make use of Deep Learning (DL) to fill this performance gap given unlabeled background data. To this goal, the authors have proposed an impostor selection algorithm and a universal model adaptation process in a hybrid system based on deep belief networks and deep neural networks to discriminatively model each target speaker. In order to have more insight into the behavior of DL techniques in both single- and multisession speaker enrollment tasks, some experiments have been carried out in this paper in both scenarios. Experiments on National Institute of Standards and Technology 2014 i-vector challenge show that 46% of this performance gap, in terms of minimum of the decision cost function, is filled by the proposed DL-based system. Furthermore, the score combination of the proposed DL-based system and PLDA with estimated labels covers 79% of this gap.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC