625 research outputs found
Modelo acĂşstico de lĂngua inglesa falada por portugueses
Trabalho de projecto de mestrado em Engenharia Informática, apresentado Ă Universidade de Lisboa, atravĂ©s da Faculdade de CiĂŞncias, 2007No contexto do reconhecimento robusto de fala baseado em modelos de Markov nĂŁo observáveis (do inglĂŞs Hidden Markov Models - HMMs) este trabalho descreve algumas metodologias e experiĂŞncias tendo em vista o reconhecimento de oradores estrangeiros. Quando falamos em Reconhecimento de Fala falamos obrigatoriamente em Modelos AcĂşsticos tambĂ©m. Os modelos acĂşsticos reflectem a maneira como pronunciamos/articulamos uma lĂngua, modelando a sequĂŞncia de sons emitidos aquando da fala. Essa modelação assenta em segmentos de fala mĂnimos, os fones, para os quais existe um conjunto de sĂmbolos/alfabetos que representam a sua pronunciação. É no campo da fonĂ©tica articulatĂłria e acĂşstica que se estuda a representação desses sĂmbolos, sua articulação e pronunciação. Conseguimos descrever palavras analisando as unidades que as constituem, os fones. Um reconhecedor de fala interpreta o sinal de entrada, a fala, como uma sequĂŞncia de sĂmbolos codificados. Para isso, o sinal Ă© fragmentado em observações de sensivelmente 10 milissegundos cada, reduzindo assim o factor de análise ao intervalo de tempo onde as caracterĂsticas de um segmento de som nĂŁo variam. Os modelos acĂşsticos dĂŁo-nos uma noção sobre a probabilidade de uma determinada observação corresponder a uma determinada entidade. É, portanto, atravĂ©s de modelos sobre as entidades do vocabulário a reconhecer que Ă© possĂvel voltar a juntar esses fragmentos de som. Os modelos desenvolvidos neste trabalho sĂŁo baseados em HMMs. Chamam-se assim por se fundamentarem nas cadeias de Markov (1856 - 1922): sequĂŞncias de estados onde cada estado Ă© condicionado pelo seu anterior. Localizando esta abordagem no nosso domĂnio, há que construir um conjunto de modelos - um para cada classe de sons a reconhecer - que serĂŁo treinados por dados de treino. Os dados sĂŁo ficheiros áudio e respectivas transcrições (ao nĂvel da palavra) de modo a que seja possĂvel decompor essa transcrição em fones e alinhá-la a cada som do ficheiro áudio correspondente. Usando um modelo de estados, onde cada estado representa uma observação ou segmento de fala descrita, os dados vĂŁo-se reagrupando de maneira a criar modelos estatĂsticos, cada vez mais fidedignos, que consistam em representações das entidades da fala de uma determinada lĂngua. O reconhecimento por parte de oradores estrangeiros com pronuncias diferentes da lĂngua para qual o reconhecedor foi concebido, pode ser um grande problema para precisĂŁo de um reconhecedor. Esta variação pode ser ainda mais problemática que a variação dialectal de uma determinada lĂngua, isto porque depende do conhecimento que cada orador tĂŞm relativamente Ă lĂngua estrangeira. Usando para uma pequena quantidade áudio de oradores estrangeiros para o treino de novos modelos acĂşsticos, foram efectuadas diversas experiĂŞncias usando corpora de Portugueses a falar InglĂŞs, de PortuguĂŞs Europeu e de InglĂŞs. Inicialmente foi explorado o comportamento, separadamente, dos modelos de Ingleses nativos e Portugueses nativos, quando testados com os corpora de teste (teste com nativos e teste com nĂŁo nativos). De seguida foi treinado um outro modelo usando em simultâneo como corpus de treino, o áudio de Portugueses a falar InglĂŞs e o de Ingleses nativos. Uma outra experiĂŞncia levada a cabo teve em conta o uso de tĂ©cnicas de adaptação, tal como a tĂ©cnica MLLR, do inglĂŞs Maximum Likelihood Linear Regression. Esta Ăşltima permite a adaptação de uma determinada caracterĂstica do orador, neste caso o sotaque estrangeiro, a um determinado modelo inicial. Com uma pequena quantidade de dados representando a caracterĂstica que se quer modelar, esta tĂ©cnica calcula um conjunto de transformações que serĂŁo aplicadas ao modelo que se quer adaptar. Foi tambĂ©m explorado o campo da modelação fonĂ©tica onde estudou-se como Ă© que o orador estrangeiro pronuncia a lĂngua estrangeira, neste caso um PortuguĂŞs a falar InglĂŞs. Este estudo foi feito com a ajuda de um linguista, o qual definiu um conjunto de fones, resultado do mapeamento do inventário de fones do InglĂŞs para o PortuguĂŞs, que representam o InglĂŞs falado por Portugueses de um determinado grupo de prestĂgio. Dada a grande variabilidade de pronĂşncias teve de se definir este grupo tendo em conta o nĂvel de literacia dos oradores. Este estudo foi posteriormente usado na criação de um novo modelo treinado com os corpora de Portugueses a falar InglĂŞs e de Portugueses nativos. Desta forma representamos um reconhecedor de PortuguĂŞs nativo onde o reconhecimento de termos ingleses Ă© possĂvel. Tendo em conta a temática do reconhecimento de fala este projecto focou tambĂ©m a recolha de corpora para portuguĂŞs europeu e a compilação de um lĂ©xico de PortuguĂŞs europeu. Na área de aquisição de corpora o autor esteve envolvido na extracção e preparação dos dados de fala telefĂłnica, para posterior treino de novos modelos acĂşsticos de portuguĂŞs europeu. Para compilação do lĂ©xico de portuguĂŞs europeu usou-se um mĂ©todo incremental semi-automático. Este mĂ©todo consistiu em gerar automaticamente a pronunciação de grupos de 10 mil palavras, sendo cada grupo revisto e corrigido por um linguista. Cada grupo de palavras revistas era posteriormente usado para melhorar as regras de geração automática de pronunciações.The tremendous growth of technology has increased the need of integration of spoken language technologies into our daily applications, providing an easy and natural access to information. These applications are of different nature with different user’s interfaces. Besides voice enabled Internet portals or tourist information systems, automatic speech recognition systems can be used in home user’s experiences where TV and other appliances could be voice controlled, discarding keyboards or mouse interfaces, or in mobile phones and palm-sized computers for a hands-free and eyes-free manipulation. The development of these systems causes several known difficulties. One of them concerns the recognizer accuracy on dealing with non-native speakers with different phonetic pronunciations of a given language. The non-native accent can be more problematic than a dialect variation on the language. This mismatch depends on the individual speaking proficiency and speaker’s mother tongue. Consequently, when the speaker’s native language is not the same as the one that was used to train the recognizer, there is a considerable loss in recognition performance. In this thesis, we examine the problem of non-native speech in a speaker-independent and large-vocabulary recognizer in which a small amount of non-native data was used for training. Several experiments were performed using Hidden Markov models, trained with speech corpora containing European Portuguese native speakers, English native speakers and English spoken by European Portuguese native speakers. Initially it was explored the behaviour of an English native model and non-native English speakers’ model. Then using different corpus weights for the English native speakers and English spoken by Portuguese speakers it was trained a model as a pool of accents. Through adaptation techniques it was used the Maximum Likelihood Linear Regression method. It was also explored how European Portuguese speakers pronounce English language studying the correspondences between the phone sets of the foreign and target languages. The result was a new phone set, consequence of the mapping between the English and the Portuguese phone sets. Then a new model was trained with English Spoken by Portuguese speakers’ data and Portuguese native data. Concerning the speech recognition subject this work has other two purposes: collecting Portuguese corpora and supporting the compilation of a Portuguese lexicon, adopting some methods and algorithms to generate automatic phonetic pronunciations. The collected corpora was processed in order to train acoustic models to be used in the Exchange 2007 domain, namely in Outlook Voice Access
Voice input/output capabilities at Perception Technology Corporation
Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included
Design of hardware architectures for HMM–based signal processing systems with applications to advanced human-machine interfaces
In questa tesi viene proposto un nuovo approccio per lo sviluppo di interfacce uomo–macchina. In particolare si
tratta il caso di sistemi di pattern recognition che fanno uso di Hidden Markov Models per la classificazione.
Il progetto di ricerca è partito dall’ideazione di nuove tecniche per la realizzazione di sistemi di riconoscimento
vocale per parlato spontaneo. Gli HMM sono stati scelti come lo strumento algoritmico di base per la realizzazione
del sistema. Dopo una fase di studio preliminare gli obiettivi sono stati estesi alla realizzazione di una architettura
hardware in grado di fornire uno strumento riconfigurabile che possa essere utilizzato non solo per il riconoscimento
vocale, ma in qualsiasi tipo di classificatore basato su HMM.
Il lavoro si concentra quindi sullo sviluppo di architetture hardware dedicate, ma nuovi risultati sono stati ottenuti
anche a livello di applicazione per quanto riguarda la classificazione di segnali elettroencefalografici attraverso
gli HMM.
Innanzitutto state sviluppata una architettura a livello di sistema applicabile a qualsiasi sistema di pattern
recognition che faccia usi di HMM. L’architettura stata concepita in modo tale da essere utilizzabile come un
sistema stand–alone. Definita l’architettura, un processore hardware per HMM, completamente riconfigurabile,
stato decritto in linguaggio VHDL e simulato con successo. Un array parallelo di questi processori costituisce di
fatto il nucleo di processamento dell’architettura sviluppata.
Sulla base del progetto in VHDL, due piattaforme di prototipaggio rapido basate su FPGA sono state selezionate
per dei test di implementazione. Diverse configurazioni costituite da array paralleli di processori HMM sono state
implementate su FPGA. Le soluzioni che offrivano un miglior compromesso tra prestazioni e quantitĂ di risorse
hardware utilizzate sono state selezionate per ulteriori analisi.
Un sistema software per il pattern recognition basato su HMM stato scelto come sistema di riferimento per
verificare la corretta funzionalitĂ delle architetture implementate. Diversi test sono stati progettati per validare che
il funzionamento del sistema corrispondesse alle specifiche iniziali. Le versioni implementate del sistema sono state
confrontate con il software di riferimento sulla base dei risultati forniti dai test. Dal confronto è stato possibile
appurare che le architetture sviluppate hanno un comportamento corrispondente a quello richiesto.
Infine le implementazioni dell’array parallelo di processori HMM `e sono state applicate a due applicazioni reali:
un riconoscitore vocale, ed un classificatore per interfacce basate su segnali elettroencefalografici. In entrambi i
casi l’architettura si è dimostrata in grado di gestire l’applicazione senza alcun problema. L’uso del processamento
hardware per il riconoscimento vocale apre di fatto la strada a nuovi sviluppi nel campo grazie al notevole incremento
di prestazioni ottenibili in termini di tempo di esecuzione. L’applicazione al processamento dell’EEG, invece,
introduce di fatto un approccio completamente nuovo alla classificazione di questo tipo di segnali, e mostra come in
futuro potrebbe essere possibile lo sviluppo di interfacce basate sulla classificazione dei segnali generati dal pensiero
spontaneo.
I possibili sviluppi del lavoro iniziato con questa tesi sono molteplici. Una direzione possibile è quella dell’implementazione
completa dell’architettura proposta come un sistema stand–alone riconfigurabile per l’accelerazione
di sistemi per pattern recognition di qualsiasi natura purchè basati su HMM. Le potenzialità di tale sistema renderebbero
possibile la realizzazione di classificatiori in tempo reale con un alto grado di complessitĂ , e quindi allo
sviluppo di interfacce realmente multimodali, con una vasta gamma di applicazioni, dai sistemi di per lo spazio a
quelli di supporto per persone disabili.In this thesis a new approach is described for the development of human–computer interfaces. In particular
the case of pattern recognition systems based on Hidden Markov Models have been taken into account.
The research started from he development of techniques for the realization of natural language speech
recognition systems. The Hidden Markov Model (HMM) was chosen as the main algorithmic tool to be
used to build the system. After the early work the goal was extended to the development of an hardware
architecture that provided a reconfigurable tool to be used in any pattern recognition task, and not only in
speech recognition.
The whole work is thus focused on the development of dedicated hardware architectures, but also some
new results have been obtained on the classification of electroencephalographic signals through the use of
HMMs.
Firstly a system–level architecture has been developed to be used in HMM based pattern recognition
systems. The architecture has been conceived in order to be able to work as a stand–alone system. Then a
VHDL description has been made of a flexible and completely reconfigurable hardware HMM processor and
the design was successfully simulated. A parallel array of these processors is actually the core processing
block of the developed architecture.
Then two suitable FPGA based, fast prototyping platforms have been identified to be the targets for
the implementation tests. Different configurations of parallel HMM processor arrays have been set up and
mapped on the target FPGAs. Some solutions have been selected to be the best in terms of balance between
performance and resources utilization.
Furthermore a software HMM based pattern recognition system has been chosen to be the reference system
for the functionality of the implemented subsystems. A set of tests have been developed with the aim to test
the correct functionality of the hardware. The implemented system was compared to the reference system
on the basis of the tests’ results, and it was found that the behavior was the one expected and the required
functionality was correctly achieved.
Finally the implementation of the parallel HMM array was tested through its application to two real–world
applications: a speech recognition task and a brain–computer interface task. In both cases the architecture
showed to be functionally suitable and powerful enough to handle the task without problems. The application
of the hardware processing to speech recognition opens new perspectives in the design of this kind of systems
because of the dramatic increment in performance. The application to brain–computer interface is really
interesting because of a new approach in the classification of EEG that shows how could be possible a future
development of interfaces based on the classification of spontaneous thought.
The possible evolution directions of the work started with this thesis are many. Effort could be spent of
the implementation of the developed architecture as a stand–alone reconfigurable system suitable for any kind
of HMM–based pattern recognition task. The potential performance of such a system could open the way
to extremely complex real–time pattern recognition systems, and thus to the realization of truly multimodal
interfaces, with a variety of applications, from space to aid systems for the impaired
Whole Word Phonetic Displays for Speech Articulation Training
The main objective of this dissertation is to investigate and develop speech recognition technologies for speech training for people with hearing impairments. During the course of this work, a computer aided speech training system for articulation speech training was also designed and implemented. The speech training system places emphasis on displays to improve children\u27s pronunciation of isolated Consonant-Vowel-Consonant (CVC) words, with displays at both the phonetic level and whole word level. This dissertation presents two hybrid methods for combining Hidden Markov Models (HMMs) and Neural Networks (NNs) for speech recognition. The first method uses NN outputs as posterior probability estimators for HMMs. The second method uses NNs to transform the original speech features to normalized features with reduced correlation. Based on experimental testing, both of the hybrid methods give higher accuracy than standard HMM methods. The second method, using the NN to create normalized features, outperforms the first method in terms of accuracy. Several graphical displays were developed to provide real time visual feedback to users, to help them to improve and correct their pronunciations
Visual recognition of American sign language using hidden Markov models
Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 48-52).by Thad Eugene Starner.M.S
Speaker independent isolated word recognition
The work presented in this thesis concerns the recognition of
isolated words using a pattern matching approach. In such a system,
an unknown speech utterance, which is to be identified, is
transformed into a pattern of characteristic features. These
features are then compared with a set of pre-stored reference
patterns that were generated from the vocabulary words. The unknown
word is identified as that vocabulary word for which the reference
pattern gives the best match.
One of the major difficul ties in the pattern comparison process is
that speech patterns, obtained from the same word, exhibit non-linear
temporal fluctuations and thus a high degree of redundancy. The
initial part of this thesis considers various dynamic time warping
techniques used for normalizing the temporal differences between
speech patterns. Redundancy removal methods are also considered, and
their effect on the recognition accuracy is assessed.
Although the use of dynamic time warping algorithms provide
considerable improvement in the accuracy of isolated word recognition
schemes, the performance is ultimately limited by their poor ability
to discriminate between acoustically similar words. Methods for
enhancing the identification rate among acoustically similar words,
by using common pattern features for similar sounding regions, are
investigated.
Pattern matching based, speaker independent systems, can only operate
with a high recognition rate, by using multiple reference patterns
for each of the words included in the vocabulary. These patterns are
obtained from the utterances of a group of speakers. The use of
multiple reference patterns, not only leads to a large increase in
the memory requirements of the recognizer, but also an increase in
the computational load. A recognition system is proposed in this
thesis, which overcomes these difficulties by (i) employing vector
quantization techniques to reduce the storage of reference patterns,
and (ii) eliminating the need for dynamic time warping which reduces
the computational complexity of the system.
Finally, a method of identifying the acoustic structure of an
utterance in terms of voiced, unvoiced, and silence segments by using
fuzzy set theory is proposed. The acoustic structure is then
employed to enhance the recognition accuracy of a conventional
isolated word recognizer
Mining of Textual Data from the Web for Speech Recognition
PrvotnĂm cĂlem tohoto projektu bylo prostudovat problematiku jazykovĂ©ho modelovánĂ pro rozpoznávánĂ Ĺ™eÄŤi a techniky pro zĂskávánĂ textovĂ˝ch dat z Webu. Text pĹ™edstavuje základnĂ techniky rozpoznávánĂ Ĺ™eÄŤi a detailnÄ›ji popisuje jazykovĂ© modely zaloĹľenĂ© na statistickĂ˝ch metodách. ZvláštÄ› se práce zabĂ˝vá kriterii pro vyhodnocenĂ kvality jazykovĂ˝ch modelĹŻ a systĂ©mĹŻ pro rozpoznávánĂ Ĺ™eÄŤi. Text dále popisuje modely a techniky dolovánĂ dat, zvláštÄ› vyhledávánĂ informacĂ. Dále jsou pĹ™edstaveny problĂ©my spojenĂ© se zĂskávánĂ dat z webu, a v kontrastu s tĂm je pĹ™edstaven vyhledávaÄŤ Google. SoučástĂ projektu byl návrh a implementace systĂ©mu pro zĂskávánĂ textu z webu, jehoĹľ detailnĂmu popisu je vÄ›nována náleĹľitá pozornost. NicmĂ©nÄ›, hlavnĂm cĂlem práce bylo ověřit, zda data zĂskaná z Webu mohou mĂt nÄ›jakĂ˝ pĹ™Ănos pro rozpoznávánĂ Ĺ™eÄŤi. PopsanĂ© techniky se tak snažà najĂt optimálnĂ zpĹŻsob, jak data zĂskaná z Webu pouĹľĂt pro zlepšenĂ ukázkovĂ˝ch jazykovĂ˝ch modelĹŻ, ale i modelĹŻ nasazenĂ˝ch v reálnĂ˝ch rozpoznávacĂch systĂ©mech.The preliminary goals of this project were to get familiar with language modeling for speech recognition and techniques for acquisition of text data from the Web. Speech recognition techniques are introduced and statistical language modeling is described in detail. The text also covers mining models and techniques, information retrieval especially. Specific problems of Web mining are discussed and Google search is introduced. Special attention was paid to detailed description of implementation of the text mining system. However, the main goal of this work was to determine, whether the data acquired from the Web can provide some improvement into the recognition systems. The text is describing experiments, which use the retrieved Web data to update sample language models.
Automatic Home Appliance Switching Using Speech Recognition Software and Embedded System
In most homes, electrical appliances are controlled and operated manually, this could be difficult and challenging to do when tiredness, handicap, morphological variations (height, aging etc.) and inadequate skill stands in the way as impediment. This study aims to implement a better and more flexible means of controlling home appliances by means of an automated switching mechanism using speech recognition technique. Acoustic signals picked by a microphone controlled by a speech recognition application generate digital signals that are passed to a microcontroller, which in turn dispatches commands that operate the relays to which the appliances in the home are connected. The goal of using speech command to automate the switching of home appliances was achieved and proved to be a more convenient means of switching home appliances
- …