149 research outputs found
LPC and its derivatives for stuttered speech recognition
Stuttering or stammering is disruptions in the normal flow of speech by dysfluencies, which can be repetitions or prolongations of phoneme or syllable. Stuttering cannot be permanently cured, though it may go into remission or stutterers can learn to shape their speech into fluent speech with an appropriate speech pathology treatment. Linear Prediction Coefficient (LPC), Linear Prediction Cepstral Coefficient (LPCC) and Line Spectral Frequency (LSF) were used for the feature extraction, while Multilayer Perceptron (MLP) was used as the classifier. The samples used were obtained from UCLASS (University College London Archive of Stuttered Speech) release 1. The LPCC-MLP system had the highest overall sensitivity, precision and the lowest overall misclassification rate. LPCC-MLP system had challenges with F3, the sensitivity of the system to F3 was negligible, similarly, the precision was moderate and the misclassification rate was negligible, but above 10%
Real time speaker recognition using MFCC and VQ
Speaker Recognition is a process of automatically recognizing who is speaking on the basis of the individual information included in speech waves. Speaker Recognition is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases.Speaker Recognition mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker’s voice signal that can later be used to represent that speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database.In feature extraction we find the Mel Frequency Cepstrum Coefficients, which are based on the known variation of the human ear’s critical bandwidths with frequency and these, are vector quantized using LBG algorithm resulting in the speaker specific codebook.
In feature matching we find the VQ distortion between the input utterance of an unknown speaker and the codebooks stored in our database. Based on this VQ distortion we decide whether to accept/reject the unknown speaker’s identity. The system I implemented in my work is 80% accurate in recognizing the correct speaker.In second phase we implement on the acoustic of Real Time speaker ecognition using mfcc and vq on a TMS320C6713 DSP board. We analyze the workload and identify the most timeconsuming
operations
Some Commonly Used Speech Feature Extraction Algorithms
Speech is a complex naturally acquired human motor ability. It is characterized in adults with the production of about 14 different sounds per second via the harmonized actions of roughly 100 muscles. Speaker recognition is the capability of a software or hardware to receive speech signal, identify the speaker present in the speech signal and recognize the speaker afterwards. Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively minimized data rate for subsequent processing and analysis. Therefore, acceptable classification is derived from excellent and quality features. Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT) and Perceptual Linear Prediction (PLP) are the speech feature extraction techniques that were discussed in these chapter. These methods have been tested in a wide variety of applications, giving them high level of reliability and acceptability. Researchers have made several modifications to the above discussed techniques to make them less susceptible to noise, more robust and consume less time. In conclusion, none of the methods is superior to the other, the area of application would determine which method to select
Voice signature based Speaker Recognition
Magister Scientiae - MSc (Computer Science)Personal identification and the protection of data are important issues because of the ubiquitousness of computing and these havethus become interesting areas of research in the field of computer science. Previously people have used a variety of ways to identify an individual and protect themselves, their property and their information
Voice-signature-based Speaker Recognition
Magister Scientiae - MSc (Computer Science)Personal
identification
and
the
protection
of
data
are
important
issues
because
of
the
ubiquitousness
of
computing
and
these
have
thus
become
interesting
areas
of
research
in
the
field
of
computer
science.
Previously
people
have
used
a
variety
of
ways
to
identify
an
individual
and
protect
themselves,
their
property
and
their
information.
This
they
did
mostly
by
means
of
locks,
passwords,
smartcards
and
biometrics.
Verifying
individuals
by
using
their
physical
or
behavioural
features
is
more
secure
than
using
other
data
such
as
passwords
or
smartcards,
because
everyone
has
unique
features
which
distinguish
him
or
her
from
others.
Furthermore
the
biometrics
of
a
person
are
difficult
to
imitate
or
steal.
Biometric
technologies
represent
a
significant
component
of
a
comprehensive
digital
identity
solution
and
play
an
important
role
in
security.
The
technologies
that
support
identification
and
authentication
of
individuals
is
based
on
either
their
physiological
or
their
behavioural
characteristics.
Live-‐data,
in
this
instance
the
human
voice,
is
the
topic
of
this
research.
The
aim
is
to
recognize
a
person’s
voice
and
to
identify
the
user
by
verifying
that
his/her
voice
is
the
same
as
a
record
of
his
/
her
voice-‐signature
in
a
systems
database.
To
address
the
main
research
question:
“What
is
the
best
way
to
identify
a
person
by
his
/
her
voice
signature?”,
design
science
research,
was
employed.
This
methodology
is
used
to
develop
an
artefact
for
solving
a
problem.
Initially
a
pilot
study
was
conducted
using
visual
representation
of
voice
signatures,
to
check
if
it
is
possible
to
identify
speakers
without
using
feature
extraction
or
matching
methods.
Subsequently,
experiments
were
conducted
with
6300
data
sets
derived
from
Texas
Instruments
and
the
Massachusetts
Institute
of
Technology
audio
database.
Two
methods
of
feature
extraction
and
classification
were
considered—mel
frequency
cepstrum
coefficient
and
linear
prediction
cepstral
coefficient
feature
extraction—and
for
classification,
the
Support
Vector
Machines
method
was
used.
The
three
methods
were
compared
in
terms
of
their
effectiveness
and
it
was
found
that
the
system
using
the
mel
frequency
cepstrum
coefficient,
for
feature
extraction,
gave
the
marginally
better
results
for
speaker
recognition
A hybrid HMM/ANN based approach for online signature verification
2007-2008 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe
Automatic Identity Recognition Using Speech Biometric
Biometric technology refers to the automatic identification of a person using physical or behavioral traits associated with him/her. This technology can be an excellent candidate for developing intelligent systems such as speaker identification, facial recognition, signature verification...etc. Biometric technology can be used to design and develop automatic identity recognition systems, which are highly demanded and can be used in banking systems, employee identification, immigration, e-commerce…etc. The first phase of this research emphasizes on the development of automatic identity recognizer using speech biometric technology based on Artificial Intelligence (AI) techniques provided in MATLAB. For our phase one, speech data is collected from 20 (10 male and 10 female) participants in order to develop the recognizer. The speech data include utterances recorded for the English language digits (0 to 9), where each participant recorded each digit 3 times, which resulted in a total of 600 utterances for all participants. For our phase two, speech data is collected from 100 (50 male and 50 female) participants in order to develop the recognizer. The speech data is divided into text-dependent and text-independent data, whereby each participant selected his/her full name and recorded it 30 times, which makes up the text-independent data. On the other hand, the text-dependent data is represented by a short Arabic language story that contains 16 sentences, whereby every sentence was recorded by every participant 5 times. As a result, this new corpus contains 3000 (30 utterances * 100 speakers) sound files that represent the text-independent data using their full names and 8000 (16 sentences * 5 utterances * 100 speakers) sound files that represent the text-dependent data using the short story. For the purpose of our phase one of developing the automatic identity recognizer using speech, the 600 utterances have undergone the feature extraction and feature classification phases. The speech-based automatic identity recognition system is based on the most dominating feature extraction technique, which is known as the Mel-Frequency Cepstral Coefficient (MFCC). For feature classification phase, the system is based on the Vector Quantization (VQ) algorithm. Based on our experimental results, the highest accuracy achieved is 76%. The experimental results have shown acceptable performance, but can be improved further in our phase two using larger speech data size and better performance classification techniques such as the Hidden Markov Model (HMM)
Text dependent speaker recognition using MFCC and LBG VQ
Speaker Recognition is a process of automatically recognizing who is speaking on the basis of the individual information included in speech waves. Speaker Recognition is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases.Speaker Recognition mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker’s voice signal that can later be used to represent that speaker.Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database.In feature extraction we find the Mel Frequency Cepstrum Coefficients, which are based on the known variation of the human ear’s critical bandwidths with frequency and these, are vector quantized using LBG algorithm resulting in the speaker specific codebook.
In feature matching we find the VQ distortion between the input utterance of an unknown speaker and the codebooks stored in our database. Based on this VQ distortion we decide whether to accept/reject the unknown speakers identity.The system I implemented in my work is 80% accurate in recognizing the correct speaker
- …