A study on features for speaker recognition by ASAM model

宋 豫

A study on features for speaker recognition by ASAM model

Authors: 宋豫
Publication date: 24 March 2020
Publisher: 法政大学大学院情報科学研究科

Abstract

For multi-speaker recognition, deep learning-based frameworks have made significant progress in multi-speaker mixed speech separation, but are unable to provide satisfactory solutions in complex auditory scenarios. A unified auditory selection framework with attention and memory can solve this problem. First, the sound characteristics of a specific speaker are accumulated into the lifetime memory during the training phase, while the speech perceptron is trained to extract temporal sound characteristics and update the memory online when the speaker perceives speech. The learning memory is then used to interact with the mix input to add and filter the target frequency from the mix stream. Finally, the network is trained to minimize the reconstruction error of attendance speech. In this study, a single speaker’s voice was extracted from a speech segment containing multiple speakers using an ASAM model, and then speaker recognition was performed using an LSTM neural network. In the LSTM network, MFCC, GFCC, and GBFB will be used to identify the three feature quantities and the results will be compared

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Hosei University Repository

oai:hosei.repo.nii.ac.jp:00022...

Last time updated on 25/04/2020