17 research outputs found
Voice Based Database Management System Using DSP Processor
Security is provided to customers through PIN/ID protection, to secure their data and information using password. This method require user to authenticate them by entering password. There are cases of fraud and theft when people can easily know the password. But there is a existing technology known as Bio-metric Identification System. It uses an individual's bio-metric characteristics, that is unique and therefore can be used to authenticate the user authority access. This invention is an implementation of speaker verification for attendance monitoring system using DSP Processor. First, speech signal will go to pre-processing phase, where it will remove the background noise then, speech signal's features will be extracted using Mel Frequency Cepstral Coefficients (MFCC) method. Then, using Hamming Window, the features will be matched with the reference speech in database. The speaker is identified by comparing the speech signal from the tested speaker. The main focus of this invention is speaker verification, which is compared between speech signal from unknown speaker to a database of known speaker using utterances. The speaker identification is used in this invention for creating database and identifying the students for maintaining attendance record. LCD Display is interfaced with Processor to show the presence of the students for a particular subject. So this can be used for monitoring the attendance of students. Also defaulter student’s list is find out according to the criteria, and it is maintained in MS Excel sheet. Future scope for this can be, informing the monthly attendance to their parents through a text message on their mobile phones using a GSM Module interfaced to processor.
DOI: 10.17762/ijritcc2321-8169.150511
GENDER INDEPENDENT DISCRIMINATIVE SPEAKER RECOGNITION IN I–VECTOR SPACE
Speaker recognition systems attain their best accuracy when
trained with gender dependent features and tested with known
gender trials. In real applications, however, gender labels are
often not given. In this work we illustrate the design of a system
that does not make use of the gender labels both in training
and in test, i.e. a completely Gender Independent (GI)
system. It relies on discriminative training, where the trials
are i–vector pairs, and the discrimination is between the hypothesis
that the pair of feature vectors in the trial belong to
the same speaker or to different speakers. We demonstrate
that this pairwise discriminative training can be interpreted as
a procedure that estimates the parameters of the best (second
order) approximation of the log–likelihood ratio score function,
and that a pairwise SVM can be used for training a gender
independent system. Our results show that a pairwise GI
SVM, saving memory and execution time, achieves on the last
NIST evaluations state–of–the–art performance, comparable
to a Gender Dependent(GD) system
Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding
In this letter, we propose a vocal tract length (VTL) perturbation method for
text-dependent speaker verification (TD-SV), in which a set of TD-SV systems
are trained, one for each VTL factor, and score-level fusion is applied to make
a final decision. Next, we explore the bottleneck (BN) feature extracted by
training deep neural networks with a self-supervised objective, autoregressive
predictive coding (APC), for TD-SV and compare it with the well-studied
speaker-discriminant BN feature. The proposed VTL method is then applied to APC
and speaker-discriminant BN features. In the end, we combine the VTL
perturbation systems trained on MFCC and the two BN features in the score
domain. Experiments are performed on the RedDots challenge 2016 database of
TD-SV using short utterances with Gaussian mixture model-universal background
model and i-vector techniques. Results show the proposed methods significantly
outperform the baselines.Comment: Copyright (c) 2021 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering
ABSTRACT Speaker clustering is an important task in many applications such as Speaker Diarization as well as Speech Recognition. Speaker clustering can be done within a single multi-speaker recording (Diarization) or for a set of different recordings. In this work we are interested by the former case and we propose a simple iterative Mean Shift (MS) algorithm to deal with this problem. Traditionally, MS algorithm is based on Euclidean distance. We propose to use the Cosine distance in order to build a new version of MS algorithm. We report results as measured by speaker and cluster impurities on NIST SRE 2008 datasets
Memory-aware i-vector extraction by means of subspace factorization
Most of the state–of–the–art speaker recognition systems use i–
vectors, a compact representation of spoken utterances. Since the “standard” i–vector extraction procedure requires large memory structures, we recently presented the Factorized Sub-space Estimation (FSE) approach, an efficient technique that dramatically reduces the memory needs for i–vector extraction, and is also fast and accurate compared to other proposed approaches. FSE is based on the approximation of the matrix T, representing the speaker variability sub–space, by means of the product of appropriately designed matrices.
In this work, we introduce and evaluate a further approximation
of the matrices that most contribute to the memory costs in the FSE approach, showing that it is possible to obtain comparable system accuracy using less than a half of FSE memory, which corresponds to more than 60 times memory reduction with respect to the standard method of i–vector extraction
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification
There are a number of studies about extraction of bottleneck (BN) features
from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases
and triphone states for improving the performance of text-dependent speaker
verification (TD-SV). However, a moderate success has been achieved. A recent
study [1] presented a time contrastive learning (TCL) concept to explore the
non-stationarity of brain signals for classification of brain states. Speech
signals have similar non-stationarity property, and TCL further has the
advantage of having no need for labeled data. We therefore present a TCL based
BN feature extraction method. The method uniformly partitions each speech
utterance in a training dataset into a predefined number of multi-frame
segments. Each segment in an utterance corresponds to one class, and class
labels are shared across utterances. DNNs are then trained to discriminate all
speech frames among the classes to exploit the temporal structure of speech. In
addition, we propose a segment-based unsupervised clustering algorithm to
re-assign class labels to the segments. TD-SV experiments were conducted on the
RedDots challenge database. The TCL-DNNs were trained using speech data of
fixed pass-phrases that were excluded from the TD-SV evaluation set, so the
learned features can be considered phrase-independent. We compare the
performance of the proposed TCL bottleneck (BN) feature with those of
short-time cepstral features and BN features extracted from DNNs discriminating
speakers, pass-phrases, speaker+pass-phrase, as well as monophones whose labels
and boundaries are generated by three different automatic speech recognition
(ASR) systems. Experimental results show that the proposed TCL-BN outperforms
cepstral features and speaker+pass-phrase discriminant BN features, and its
performance is on par with those of ASR derived BN features. Moreover,....Comment: Copyright (c) 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work