139 research outputs found
Confusion modelling for lip-reading
Lip-reading is mostly used as a means of communication by people with hearing di�fficulties. Recent work has explored the automation of this process, with the aim
of building a speech recognition system entirely driven by lip movements. However, this work has so far produced poor results because of factors such as high variability
of speaker features, diffi�culties in mapping from visual features to speech sounds, and high co-articulation of visual features.
The motivation for the work in this thesis is inspired by previous work in dysarthric speech recognition [Morales, 2009]. Dysathric speakers have poor control over their
articulators, often leading to a reduced phonemic repertoire. The premise of this thesis is that recognition of the visual speech signal is a similar problem to recog-
nition of dysarthric speech, in that some information about the speech signal has been lost in both cases, and this brings about a systematic pattern of errors in the
decoded output.
This work attempts to exploit the systematic nature of these errors by modelling them in the framework of a weighted finite-state transducer cascade. Results
indicate that the technique can achieve slightly lower error rates than the conventional approach. In addition, it explores some interesting more general questions for
automated lip-reading
Automatic speech recognition: from study to practice
Today, automatic speech recognition (ASR) is widely used for different purposes such as robotics, multimedia, medical and industrial application. Although many researches have been performed in this field in the past decades, there is still a lot of room to work. In order to start working in this area, complete knowledge of ASR systems as well as their weak points and problems is inevitable. Besides that, practical experience improves the theoretical knowledge understanding in a reliable way. Regarding to these facts, in this master thesis, we have first reviewed the principal structure of the standard HMM-based ASR systems from technical point of view. This includes, feature extraction, acoustic modeling, language modeling and decoding. Then, the most significant challenging points in ASR systems is discussed. These challenging points address different internal components characteristics or external agents which affect the ASR systems performance. Furthermore, we have implemented a Spanish language recognizer using HTK toolkit. Finally, two open research lines according to the studies of different sources in the field of ASR has been suggested for future work
Hidden Markov models and neural networks for speech recognition
The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than first order dependencies in the observed data sequences. This is due to the first order state process and the assumption of state conditional independence between observations. Artificial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classification and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classification abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and ..
Advances in Character Recognition
This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
Design of hardware architectures for HMM–based signal processing systems with applications to advanced human-machine interfaces
In questa tesi viene proposto un nuovo approccio per lo sviluppo di interfacce uomo–macchina. In particolare si
tratta il caso di sistemi di pattern recognition che fanno uso di Hidden Markov Models per la classificazione.
Il progetto di ricerca è partito dall’ideazione di nuove tecniche per la realizzazione di sistemi di riconoscimento
vocale per parlato spontaneo. Gli HMM sono stati scelti come lo strumento algoritmico di base per la realizzazione
del sistema. Dopo una fase di studio preliminare gli obiettivi sono stati estesi alla realizzazione di una architettura
hardware in grado di fornire uno strumento riconfigurabile che possa essere utilizzato non solo per il riconoscimento
vocale, ma in qualsiasi tipo di classificatore basato su HMM.
Il lavoro si concentra quindi sullo sviluppo di architetture hardware dedicate, ma nuovi risultati sono stati ottenuti
anche a livello di applicazione per quanto riguarda la classificazione di segnali elettroencefalografici attraverso
gli HMM.
Innanzitutto state sviluppata una architettura a livello di sistema applicabile a qualsiasi sistema di pattern
recognition che faccia usi di HMM. L’architettura stata concepita in modo tale da essere utilizzabile come un
sistema stand–alone. Definita l’architettura, un processore hardware per HMM, completamente riconfigurabile,
stato decritto in linguaggio VHDL e simulato con successo. Un array parallelo di questi processori costituisce di
fatto il nucleo di processamento dell’architettura sviluppata.
Sulla base del progetto in VHDL, due piattaforme di prototipaggio rapido basate su FPGA sono state selezionate
per dei test di implementazione. Diverse configurazioni costituite da array paralleli di processori HMM sono state
implementate su FPGA. Le soluzioni che offrivano un miglior compromesso tra prestazioni e quantità di risorse
hardware utilizzate sono state selezionate per ulteriori analisi.
Un sistema software per il pattern recognition basato su HMM stato scelto come sistema di riferimento per
verificare la corretta funzionalità delle architetture implementate. Diversi test sono stati progettati per validare che
il funzionamento del sistema corrispondesse alle specifiche iniziali. Le versioni implementate del sistema sono state
confrontate con il software di riferimento sulla base dei risultati forniti dai test. Dal confronto è stato possibile
appurare che le architetture sviluppate hanno un comportamento corrispondente a quello richiesto.
Infine le implementazioni dell’array parallelo di processori HMM `e sono state applicate a due applicazioni reali:
un riconoscitore vocale, ed un classificatore per interfacce basate su segnali elettroencefalografici. In entrambi i
casi l’architettura si è dimostrata in grado di gestire l’applicazione senza alcun problema. L’uso del processamento
hardware per il riconoscimento vocale apre di fatto la strada a nuovi sviluppi nel campo grazie al notevole incremento
di prestazioni ottenibili in termini di tempo di esecuzione. L’applicazione al processamento dell’EEG, invece,
introduce di fatto un approccio completamente nuovo alla classificazione di questo tipo di segnali, e mostra come in
futuro potrebbe essere possibile lo sviluppo di interfacce basate sulla classificazione dei segnali generati dal pensiero
spontaneo.
I possibili sviluppi del lavoro iniziato con questa tesi sono molteplici. Una direzione possibile è quella dell’implementazione
completa dell’architettura proposta come un sistema stand–alone riconfigurabile per l’accelerazione
di sistemi per pattern recognition di qualsiasi natura purchè basati su HMM. Le potenzialità di tale sistema renderebbero
possibile la realizzazione di classificatiori in tempo reale con un alto grado di complessità, e quindi allo
sviluppo di interfacce realmente multimodali, con una vasta gamma di applicazioni, dai sistemi di per lo spazio a
quelli di supporto per persone disabili.In this thesis a new approach is described for the development of human–computer interfaces. In particular
the case of pattern recognition systems based on Hidden Markov Models have been taken into account.
The research started from he development of techniques for the realization of natural language speech
recognition systems. The Hidden Markov Model (HMM) was chosen as the main algorithmic tool to be
used to build the system. After the early work the goal was extended to the development of an hardware
architecture that provided a reconfigurable tool to be used in any pattern recognition task, and not only in
speech recognition.
The whole work is thus focused on the development of dedicated hardware architectures, but also some
new results have been obtained on the classification of electroencephalographic signals through the use of
HMMs.
Firstly a system–level architecture has been developed to be used in HMM based pattern recognition
systems. The architecture has been conceived in order to be able to work as a stand–alone system. Then a
VHDL description has been made of a flexible and completely reconfigurable hardware HMM processor and
the design was successfully simulated. A parallel array of these processors is actually the core processing
block of the developed architecture.
Then two suitable FPGA based, fast prototyping platforms have been identified to be the targets for
the implementation tests. Different configurations of parallel HMM processor arrays have been set up and
mapped on the target FPGAs. Some solutions have been selected to be the best in terms of balance between
performance and resources utilization.
Furthermore a software HMM based pattern recognition system has been chosen to be the reference system
for the functionality of the implemented subsystems. A set of tests have been developed with the aim to test
the correct functionality of the hardware. The implemented system was compared to the reference system
on the basis of the tests’ results, and it was found that the behavior was the one expected and the required
functionality was correctly achieved.
Finally the implementation of the parallel HMM array was tested through its application to two real–world
applications: a speech recognition task and a brain–computer interface task. In both cases the architecture
showed to be functionally suitable and powerful enough to handle the task without problems. The application
of the hardware processing to speech recognition opens new perspectives in the design of this kind of systems
because of the dramatic increment in performance. The application to brain–computer interface is really
interesting because of a new approach in the classification of EEG that shows how could be possible a future
development of interfaces based on the classification of spontaneous thought.
The possible evolution directions of the work started with this thesis are many. Effort could be spent of
the implementation of the developed architecture as a stand–alone reconfigurable system suitable for any kind
of HMM–based pattern recognition task. The potential performance of such a system could open the way
to extremely complex real–time pattern recognition systems, and thus to the realization of truly multimodal
interfaces, with a variety of applications, from space to aid systems for the impaired
A detection-based pattern recognition framework and its applications
The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation.
Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages.
A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts
can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage.
This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed
in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min
Recommended from our members
3D multiresolution statistical approaches for accelerated medical image and volume segmentation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Medical volume segmentation got the attraction of many researchers; therefore, many techniques have been implemented in terms of medical imaging including segmentations and other imaging processes. This research focuses on an implementation of segmentation system which uses several techniques together or on their own to segment medical volumes, the system takes a stack of 2D slices or a full 3D volumes acquired from medical scanners as a data input.
Two main approaches have been implemented in this research for segmenting medical volume which are multi-resolution analysis and statistical modeling. Multi-resolution analysis has been mainly employed in this research for extracting the features. Higher dimensions of discontinuity (line or curve singularity) have been extracted in medical images using a modified multi-resolution analysis transforms such as ridgelet and curvelet transforms.
The second implemented approach in this thesis is the use of statistical modeling in medical image segmentation; Hidden Markov models have been enhanced here to segment medical slices automatically, accurately, reliably and with lossless results. But the problem with using Markov models here is the computational time which is too long. This has been addressed by using feature reduction techniques which has also been implemented in this thesis. Some feature reduction and dimensionality reduction techniques have been used to accelerate the slowest block in the proposed system. This includes Principle Components Analysis, Gaussian Pyramids and other methods. The feature reduction techniques have been employed efficiently with the 3D volume segmentation techniques such as 3D wavelet and 3D Hidden Markov models.
The system has been tested and validated using several procedures starting at a comparison with the predefined results, crossing the specialists’ validations, and ending by validating the system using a survey filled by the end users explaining the techniques and the results. This concludes that Markovian models segmentation results has overcome all other techniques in most patients’ cases. Curvelet transform has been also proved promising segmentation results; the end users rate it better than Markovian models due to the long time required with Hidden Markov models
- …