1,350 research outputs found
BOOSTED BINARY FEATURES FOR NOISE-ROBUST SPEAKER VERIFICATION
The standard approach to speaker verification is to extract cepstral features from the speech spectrum and model them by generative or discriminative techniques. We propose a novel approach where a set of client-specific binary features carrying maximal discriminative information specific to the individual client are estimated from an ensemble of pair-wise comparisons of frequency components in magnitude spectra, using Adaboost algorithm. The final classifier is a simple linear combination of these selected features. Experiments on the XM2VTS database strictly according to a standard evaluation protocol have shown that although the proposed framework yields comparatively lower performance on clean speech, it significantly outperforms the state-of-the-art MFCC-GMM system in mismatched conditions with training on clean speech and testing on speech corrupted by four types of additive noise from the standard Noisex-92 database
Computer Graphics and Video Features for Speaker Recognition
Tato práce popisuje netradiční metodu rozpoznávání řečníka pomocí příznaků a alogoritmů používaných převážně v počítačovém vidění. V úvodu jsou shrnuty potřebné teoretické znalosti z oblasti počítačového rozpoznávání. Jako aplikace grafických příznaků v rozpoznávání řečníka jsou detailněji popsány již známé BBF příznaky. Tyto jsou vyhodnoceny nad standardními řečovými databázemi TIMIT a NIST SRE 2010. Experimentální výsledky jsou shrnuty a porovnány se standardními metodami. V závěru jsou jsou navrženy možné směry budoucí práce.We describe a non-traditional method for speaker recognition that uses features and algorithms used mainly for computer vision. Important theoretical knowledge of computer recognition is summarized first. The Boosted Binary Features are described and explored as an already proposed method, that has roots in computer vision. This method is evaluated on standard speaker recognition databases TIMIT and NIST SRE 2010. Experimental results are given and compared to standard methods. Possible directions for future work are proposed at the end.
A Fast Parts-based Approach to Speaker Verification using Boosted Slice Classifiers
Speaker verification on portable devices like smartphones is gradually becoming popular. In this context, two issues need to be considered: 1) such devices have relatively limited computation resources, and 2) they are liable to be used everywhere, possibly in very noisy, uncontrolled environments. This work aims to address both these issues by proposing a computationally efficient yet robust speaker verification system. This novel parts-based system draws inspiration from face and object detection systems in the computer vision domain. The system involves boosted ensembles of simple threshold-based classifiers. It uses a novel set of features extracted from speech spectra, called “slice features”. The performance of the proposed system was evaluated through extensive studies involving a wide range of experimental conditions using the TIMIT, HTIMIT and MOBIO corpus, against standard cepstral features and Gaussian Mixture Model-based speaker verification systems
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
- …