540 research outputs found
Fingerprint Verification System Using Support Vector Machine
Efficient fingerprint verification system is needed in many places for personal identification to
access physical facilities, information etc. This paper proposes robust verification system based on
features extracted from human fingerprints and a pattern classifier called Support Vector Machine
(SVM). Three set of features are fused together and passed to the classifier. The fused feature is used to
train the system for effective verification of users fingerprint images. The result obtained after testing
100 fingerprints is very encouraging
Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement
The performance of automatic speech recognition systems strongly decreases whenever the speech signal is disturbed by background noise. We aim to improve noise robustness focusing on all major levels of speech recognition: feature extraction, feature enhancement, and speech modeling. Different auditory modeling concepts, speech enhancement techniques, training strategies, and model architectures are implemented in an in-car digit and spelling recognition task. We prove that joint speech and noise modeling with a global Switching Linear Dynamic Model (SLDM) capturing the dynamics of speech, and a Linear Dynamic Model (LDM) for noise, prevails over state-of-theart speech enhancement techniques. Furthermore we show that the baseline recognizer of the Interspeech Consonant Challenge 2008 can be outperformed by SLDM feature enhancement for almost all of the noisy testsets
Learning Dynamics in Linear VAE: Posterior Collapse Threshold, Superfluous Latent Space Pitfalls, and Speedup with KL Annealing
Variational autoencoders (VAEs) face a notorious problem wherein the
variational posterior often aligns closely with the prior, a phenomenon known
as posterior collapse, which hinders the quality of representation learning. To
mitigate this problem, an adjustable hyperparameter and a strategy for
annealing this parameter, called KL annealing, are proposed. This study
presents a theoretical analysis of the learning dynamics in a minimal VAE. It
is rigorously proved that the dynamics converge to a deterministic process
within the limit of large input dimensions, thereby enabling a detailed
dynamical analysis of the generalization error. Furthermore, the analysis shows
that the VAE initially learns entangled representations and gradually acquires
disentangled representations. A fixed-point analysis of the deterministic
process reveals that when exceeds a certain threshold, posterior
collapse becomes inevitable regardless of the learning period. Additionally,
the superfluous latent variables for the data-generative factors lead to
overfitting of the background noise; this adversely affects both generalization
and learning convergence. The analysis further unveiled that appropriately
tuned KL annealing can accelerate convergence.Comment: 24 pages, 5 figure
Bio-motivated features and deep learning for robust speech recognition
Mención Internacional en el tÃtulo de doctorIn spite of the enormous leap forward that the Automatic Speech
Recognition (ASR) technologies has experienced over the last five years
their performance under hard environmental condition is still far from
that of humans preventing their adoption in several real applications.
In this thesis the challenge of robustness of modern automatic speech
recognition systems is addressed following two main research lines.
The first one focuses on modeling the human auditory system to
improve the robustness of the feature extraction stage yielding to novel
auditory motivated features. Two main contributions are produced.
On the one hand, a model of the masking behaviour of the Human
Auditory System (HAS) is introduced, based on the non-linear filtering
of a speech spectro-temporal representation applied simultaneously
to both frequency and time domains. This filtering is accomplished
by using image processing techniques, in particular mathematical
morphology operations with an specifically designed Structuring Element
(SE) that closely resembles the masking phenomena that take
place in the cochlea. On the other hand, the temporal patterns of
auditory-nerve firings are modeled. Most conventional acoustic features
are based on short-time energy per frequency band discarding
the information contained in the temporal patterns. Our contribution
is the design of several types of feature extraction schemes based on
the synchrony effect of auditory-nerve activity, showing that the modeling
of this effect can indeed improve speech recognition accuracy in
the presence of additive noise. Both models are further integrated into
the well known Power Normalized Cepstral Coefficients (PNCC).
The second research line addresses the problem of robustness in
noisy environments by means of the use of Deep Neural Networks
(DNNs)-based acoustic modeling and, in particular, of Convolutional
Neural Networks (CNNs) architectures. A deep residual network
scheme is proposed and adapted for our purposes, allowing Residual
Networks (ResNets), originally intended for image processing tasks,
to be used in speech recognition where the network input is small
in comparison with usual image dimensions. We have observed that
ResNets on their own already enhance the robustness of the whole system
against noisy conditions. Moreover, our experiments demonstrate
that their combination with the auditory motivated features devised
in this thesis provide significant improvements in recognition accuracy
in comparison to other state-of-the-art CNN-based ASR systems
under mismatched conditions, while maintaining the performance in
matched scenarios.
The proposed methods have been thoroughly tested and compared
with other state-of-the-art proposals for a variety of datasets and
conditions. The obtained results prove that our methods outperform
other state-of-the-art approaches and reveal that they are suitable for
practical applications, specially where the operating conditions are
unknown.El objetivo de esta tesis se centra en proponer soluciones al problema
del reconocimiento de habla robusto; por ello, se han llevado a cabo
dos lÃneas de investigación.
En la primera lÃınea se han propuesto esquemas de extracción de caracterÃsticas novedosos, basados en el modelado del comportamiento
del sistema auditivo humano, modelando especialmente los fenómenos
de enmascaramiento y sincronÃa. En la segunda, se propone mejorar
las tasas de reconocimiento mediante el uso de técnicas de
aprendizaje profundo, en conjunto con las caracterÃsticas propuestas.
Los métodos propuestos tienen como principal objetivo, mejorar la
precisión del sistema de reconocimiento cuando las condiciones de
operación no son conocidas, aunque el caso contrario también ha sido
abordado.
En concreto, nuestras principales propuestas son los siguientes:
Simular el sistema auditivo humano con el objetivo de mejorar
la tasa de reconocimiento en condiciones difÃciles, principalmente
en situaciones de alto ruido, proponiendo esquemas de
extracción de caracterÃsticas novedosos.
Siguiendo esta dirección, nuestras principales propuestas se detallan a continuación:
• Modelar el comportamiento de enmascaramiento del sistema
auditivo humano, usando técnicas del procesado de
imagen sobre el espectro, en concreto, llevando a cabo el
diseño de un filtro morfológico que captura este efecto.
• Modelar el efecto de la sincronà que tiene lugar en el nervio
auditivo.
• La integración de ambos modelos en los conocidos Power
Normalized Cepstral Coefficients (PNCC).
La aplicación de técnicas de aprendizaje profundo con el objetivo
de hacer el sistema más robusto frente al ruido, en particular
con el uso de redes neuronales convolucionales profundas, como
pueden ser las redes residuales.
Por último, la aplicación de las caracterÃsticas propuestas en
combinación con las redes neuronales profundas, con el objetivo
principal de obtener mejoras significativas, cuando las condiciones
de entrenamiento y test no coinciden.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Javier Ferreiros López.- Secretario: Fernando DÃaz de MarÃa.- Vocal: Rubén Solera Ureñ
Generalized Approximate Survey Propagation for High-Dimensional Estimation
In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal
that is observed through a linear transform followed by a component-wise,
possibly nonlinear and noisy, channel. In the Bayesian optimal setting,
Generalized Approximate Message Passing (GAMP) is known to achieve optimal
performance for GLE. However, its performance can significantly degrade
whenever there is a mismatch between the assumed and the true generative model,
a situation frequently encountered in practice. In this paper, we propose a new
algorithm, named Generalized Approximate Survey Propagation (GASP), for solving
GLE in the presence of prior or model mis-specifications. As a prototypical
example, we consider the phase retrieval problem, where we show that GASP
outperforms the corresponding GAMP, reducing the reconstruction threshold and,
for certain choices of its parameters, approaching Bayesian optimal
performance. Furthermore, we present a set of State Evolution equations that
exactly characterize the dynamics of GASP in the high-dimensional limit
Histogram equalization for robust text-independent speaker verification in telephone environments
Word processed copy.
Includes bibliographical references
- …