876 research outputs found
Continuous speech recognition with modified learning vector quantization algorithm and two-level DP-matching
PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSIN
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Speaker independent isolated word recognition
The work presented in this thesis concerns the recognition of
isolated words using a pattern matching approach. In such a system,
an unknown speech utterance, which is to be identified, is
transformed into a pattern of characteristic features. These
features are then compared with a set of pre-stored reference
patterns that were generated from the vocabulary words. The unknown
word is identified as that vocabulary word for which the reference
pattern gives the best match.
One of the major difficul ties in the pattern comparison process is
that speech patterns, obtained from the same word, exhibit non-linear
temporal fluctuations and thus a high degree of redundancy. The
initial part of this thesis considers various dynamic time warping
techniques used for normalizing the temporal differences between
speech patterns. Redundancy removal methods are also considered, and
their effect on the recognition accuracy is assessed.
Although the use of dynamic time warping algorithms provide
considerable improvement in the accuracy of isolated word recognition
schemes, the performance is ultimately limited by their poor ability
to discriminate between acoustically similar words. Methods for
enhancing the identification rate among acoustically similar words,
by using common pattern features for similar sounding regions, are
investigated.
Pattern matching based, speaker independent systems, can only operate
with a high recognition rate, by using multiple reference patterns
for each of the words included in the vocabulary. These patterns are
obtained from the utterances of a group of speakers. The use of
multiple reference patterns, not only leads to a large increase in
the memory requirements of the recognizer, but also an increase in
the computational load. A recognition system is proposed in this
thesis, which overcomes these difficulties by (i) employing vector
quantization techniques to reduce the storage of reference patterns,
and (ii) eliminating the need for dynamic time warping which reduces
the computational complexity of the system.
Finally, a method of identifying the acoustic structure of an
utterance in terms of voiced, unvoiced, and silence segments by using
fuzzy set theory is proposed. The acoustic structure is then
employed to enhance the recognition accuracy of a conventional
isolated word recognizer
Biometrics
Biometrics uses methods for unique recognition of humans based upon one or more intrinsic physical or behavioral traits. In computer science, particularly, biometrics is used as a form of identity access management and access control. It is also used to identify individuals in groups that are under surveillance. The book consists of 13 chapters, each focusing on a certain aspect of the problem. The book chapters are divided into three sections: physical biometrics, behavioral biometrics and medical biometrics. The key objective of the book is to provide comprehensive reference and text on human authentication and people identity verification from both physiological, behavioural and other points of view. It aims to publish new insights into current innovations in computer systems and technology for biometrics development and its applications. The book was reviewed by the editor Dr. Jucheng Yang, and many of the guest editors, such as Dr. Girija Chetty, Dr. Norman Poh, Dr. Loris Nanni, Dr. Jianjiang Feng, Dr. Dongsun Park, Dr. Sook Yoon and so on, who also made a significant contribution to the book
Implementation and Evaluation of Acoustic Distance Measures for Syllables
Munier C. Implementation and Evaluation of Acoustic Distance Measures for Syllables. Bielefeld (Germany): Bielefeld University; 2011.In dieser Arbeit werden verschiedene akustische Ähnlichkeitsmaße für Silben motiviert und anschließend evaluiert. Der Mahalanobisabstand als lokales Abstandsmaß für einen Dynamic-Time-Warping-Ansatz zum Messen von akustischen Abständen hat die Fähigkeit, Silben zu unterscheiden. Als solcher erlaubt er die Klassifizierung von Silben mit einer Genauigkeit, die für die Klassifizierung von kleinen akustischen Einheiten üblich ist (60 Prozent für eine Nächster-Nachbar-Klassifizierung auf einem Satz von zehn Silben für Samples eines einzelnen Sprechers). Dieses Maß kann durch verschiedene Techniken verbessert werden, die jedoch seine Ausführungsgeschwindigkeit verschlechtern (Benutzen von mehr Mischverteilungskomponenten für die Schätzung von Kovarianzen auf einer Gaußschen Mischverteilung, Benutzen von voll besetzten Kovarianzmatrizen anstelle von diagonalen Kovarianzmatrizen). Durch experimentelle Evaluierung wird deutlich, dass ein gut funktionierender Algorithmus zur Silbensegmentierung, welcher eine akkurate Schätzung von Silbengrenzen erlaubt, für die korrekte Berechnung von akustischen Abständen durch die in dieser Arbeit entwickelten Ähnlichkeitsmaße unabdingbar ist. Weitere Ansätze für Ähnlichkeitsmaße, die durch ihre Anwendung in der Timbre-Klassifizierung von Musikstücken motiviert sind, zeigen keine adäquate Fähigkeit zur Silbenunterscheidung.In this work, several acoustic similarity measures for syllables are motivated and successively evaluated. The Mahalanobis distance as local distance measure for a dynamic time warping approach to measure acoustic distances is a measure that is able to discriminate syllables and thus allows for syllable classification with an accuracy that is common to the classification of small acoustic units (60 percent for a nearest neighbor classification of a set of ten syllables using samples of a single speaker). This measure can be improved using several techniques that however impair the execution speed of the distance measure (usage of more mixture density components for the estimation of covariances from a Gaussian mixture model, usage of fully occupied covariance matrices instead of diagonal covariance matrices). Through experimental evaluation it becomes evident that a decently working syllable segmentation algorithm allowing for accurate syllable border estimations is essential to the correct computation of acoustic distances by the similarity measures developed in this work. Further approaches for similarity measures which are motivated by their usage in timbre classification of music pieces do not show adequate syllable discrimination abilities
- …