10 research outputs found
Modelo acĂşstico de lĂngua inglesa falada por portugueses
Trabalho de projecto de mestrado em Engenharia Informática, apresentado Ă Universidade de Lisboa, atravĂ©s da Faculdade de CiĂŞncias, 2007No contexto do reconhecimento robusto de fala baseado em modelos de Markov nĂŁo observáveis (do inglĂŞs Hidden Markov Models - HMMs) este trabalho descreve algumas metodologias e experiĂŞncias tendo em vista o reconhecimento de oradores estrangeiros. Quando falamos em Reconhecimento de Fala falamos obrigatoriamente em Modelos AcĂşsticos tambĂ©m. Os modelos acĂşsticos reflectem a maneira como pronunciamos/articulamos uma lĂngua, modelando a sequĂŞncia de sons emitidos aquando da fala. Essa modelação assenta em segmentos de fala mĂnimos, os fones, para os quais existe um conjunto de sĂmbolos/alfabetos que representam a sua pronunciação. É no campo da fonĂ©tica articulatĂłria e acĂşstica que se estuda a representação desses sĂmbolos, sua articulação e pronunciação. Conseguimos descrever palavras analisando as unidades que as constituem, os fones. Um reconhecedor de fala interpreta o sinal de entrada, a fala, como uma sequĂŞncia de sĂmbolos codificados. Para isso, o sinal Ă© fragmentado em observações de sensivelmente 10 milissegundos cada, reduzindo assim o factor de análise ao intervalo de tempo onde as caracterĂsticas de um segmento de som nĂŁo variam. Os modelos acĂşsticos dĂŁo-nos uma noção sobre a probabilidade de uma determinada observação corresponder a uma determinada entidade. É, portanto, atravĂ©s de modelos sobre as entidades do vocabulário a reconhecer que Ă© possĂvel voltar a juntar esses fragmentos de som. Os modelos desenvolvidos neste trabalho sĂŁo baseados em HMMs. Chamam-se assim por se fundamentarem nas cadeias de Markov (1856 - 1922): sequĂŞncias de estados onde cada estado Ă© condicionado pelo seu anterior. Localizando esta abordagem no nosso domĂnio, há que construir um conjunto de modelos - um para cada classe de sons a reconhecer - que serĂŁo treinados por dados de treino. Os dados sĂŁo ficheiros áudio e respectivas transcrições (ao nĂvel da palavra) de modo a que seja possĂvel decompor essa transcrição em fones e alinhá-la a cada som do ficheiro áudio correspondente. Usando um modelo de estados, onde cada estado representa uma observação ou segmento de fala descrita, os dados vĂŁo-se reagrupando de maneira a criar modelos estatĂsticos, cada vez mais fidedignos, que consistam em representações das entidades da fala de uma determinada lĂngua. O reconhecimento por parte de oradores estrangeiros com pronuncias diferentes da lĂngua para qual o reconhecedor foi concebido, pode ser um grande problema para precisĂŁo de um reconhecedor. Esta variação pode ser ainda mais problemática que a variação dialectal de uma determinada lĂngua, isto porque depende do conhecimento que cada orador tĂŞm relativamente Ă lĂngua estrangeira. Usando para uma pequena quantidade áudio de oradores estrangeiros para o treino de novos modelos acĂşsticos, foram efectuadas diversas experiĂŞncias usando corpora de Portugueses a falar InglĂŞs, de PortuguĂŞs Europeu e de InglĂŞs. Inicialmente foi explorado o comportamento, separadamente, dos modelos de Ingleses nativos e Portugueses nativos, quando testados com os corpora de teste (teste com nativos e teste com nĂŁo nativos). De seguida foi treinado um outro modelo usando em simultâneo como corpus de treino, o áudio de Portugueses a falar InglĂŞs e o de Ingleses nativos. Uma outra experiĂŞncia levada a cabo teve em conta o uso de tĂ©cnicas de adaptação, tal como a tĂ©cnica MLLR, do inglĂŞs Maximum Likelihood Linear Regression. Esta Ăşltima permite a adaptação de uma determinada caracterĂstica do orador, neste caso o sotaque estrangeiro, a um determinado modelo inicial. Com uma pequena quantidade de dados representando a caracterĂstica que se quer modelar, esta tĂ©cnica calcula um conjunto de transformações que serĂŁo aplicadas ao modelo que se quer adaptar. Foi tambĂ©m explorado o campo da modelação fonĂ©tica onde estudou-se como Ă© que o orador estrangeiro pronuncia a lĂngua estrangeira, neste caso um PortuguĂŞs a falar InglĂŞs. Este estudo foi feito com a ajuda de um linguista, o qual definiu um conjunto de fones, resultado do mapeamento do inventário de fones do InglĂŞs para o PortuguĂŞs, que representam o InglĂŞs falado por Portugueses de um determinado grupo de prestĂgio. Dada a grande variabilidade de pronĂşncias teve de se definir este grupo tendo em conta o nĂvel de literacia dos oradores. Este estudo foi posteriormente usado na criação de um novo modelo treinado com os corpora de Portugueses a falar InglĂŞs e de Portugueses nativos. Desta forma representamos um reconhecedor de PortuguĂŞs nativo onde o reconhecimento de termos ingleses Ă© possĂvel. Tendo em conta a temática do reconhecimento de fala este projecto focou tambĂ©m a recolha de corpora para portuguĂŞs europeu e a compilação de um lĂ©xico de PortuguĂŞs europeu. Na área de aquisição de corpora o autor esteve envolvido na extracção e preparação dos dados de fala telefĂłnica, para posterior treino de novos modelos acĂşsticos de portuguĂŞs europeu. Para compilação do lĂ©xico de portuguĂŞs europeu usou-se um mĂ©todo incremental semi-automático. Este mĂ©todo consistiu em gerar automaticamente a pronunciação de grupos de 10 mil palavras, sendo cada grupo revisto e corrigido por um linguista. Cada grupo de palavras revistas era posteriormente usado para melhorar as regras de geração automática de pronunciações.The tremendous growth of technology has increased the need of integration of spoken language technologies into our daily applications, providing an easy and natural access to information. These applications are of different nature with different user’s interfaces. Besides voice enabled Internet portals or tourist information systems, automatic speech recognition systems can be used in home user’s experiences where TV and other appliances could be voice controlled, discarding keyboards or mouse interfaces, or in mobile phones and palm-sized computers for a hands-free and eyes-free manipulation. The development of these systems causes several known difficulties. One of them concerns the recognizer accuracy on dealing with non-native speakers with different phonetic pronunciations of a given language. The non-native accent can be more problematic than a dialect variation on the language. This mismatch depends on the individual speaking proficiency and speaker’s mother tongue. Consequently, when the speaker’s native language is not the same as the one that was used to train the recognizer, there is a considerable loss in recognition performance. In this thesis, we examine the problem of non-native speech in a speaker-independent and large-vocabulary recognizer in which a small amount of non-native data was used for training. Several experiments were performed using Hidden Markov models, trained with speech corpora containing European Portuguese native speakers, English native speakers and English spoken by European Portuguese native speakers. Initially it was explored the behaviour of an English native model and non-native English speakers’ model. Then using different corpus weights for the English native speakers and English spoken by Portuguese speakers it was trained a model as a pool of accents. Through adaptation techniques it was used the Maximum Likelihood Linear Regression method. It was also explored how European Portuguese speakers pronounce English language studying the correspondences between the phone sets of the foreign and target languages. The result was a new phone set, consequence of the mapping between the English and the Portuguese phone sets. Then a new model was trained with English Spoken by Portuguese speakers’ data and Portuguese native data. Concerning the speech recognition subject this work has other two purposes: collecting Portuguese corpora and supporting the compilation of a Portuguese lexicon, adopting some methods and algorithms to generate automatic phonetic pronunciations. The collected corpora was processed in order to train acoustic models to be used in the Exchange 2007 domain, namely in Outlook Voice Access
Analysis and modeling of non-native speech for automatic speech recognition
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 75-77).by Karen Livescu.S.M
Telephone-Based Conversational Speech Recognition in the Jupiter Domain
This paper describes our experiences with developing a telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recognizer vocabulary, pronunciations, language and acoustic models for this system, and report on the current performance of the recognizer under several different conditions
Recommended from our members
Using automatic speech recognition to evaluate Arabic to English transliteration
Increased travel and international communication has led to an increased need for transliteration of Arabic proper names for people, places, technical terms and organisations. There are a variety of available Arabic to English transliteration systems such as Unicode, the Buckwalter Arabic transliteration, and ArabTeX. The transliteration tables have been developed and used by researchers for many years, but there are only limited attempts to evaluate and compare different transliteration systems. This thesis investigates whether or not speech recognition technology could be used to evaluate different Arabic-English transliteration systems. In order to do so there were 5 main objectives: firstly, to investigate the possibility of using English speech recognition engines to recognize Arabic words; secondly, to establish the possibility of automatic transliteration of diacritised Arabic words for the purpose of creating a vocabulary for the speech recognition engine; thirdly, to explore the possibility of automatically generating transliterations of non diacritised Arabic words; fourthly to construct a general method to compare and evaluate different transliteration; and finally, to test the system and use it to experiment with new transliterations ideas
Towards multi-domain speech understanding with flexible and dynamic vocabulary
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 201-208).In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis will describe our work towards realizing such a vision, using a multi-stage architecture. Our work is focused on organizing the application of linguistic constraints in order to accommodate multiple domain topics and dynamic vocabulary at the spoken input. The philosophy is to exclusively apply below word-level linguistic knowledge at the initial stage. Such knowledge is domain-independent and general to all of the English language. Hence, this is broad enough to support any unknown words that may appear at the input, as well as input from several topic domains. At the same time, the initial pass narrows the search space for the next stage, where domain-specific knowledge that resides at the word-level or above is applied. In the second stage, we envision several parallel recognizers, each with higher order language models tailored specifically to its domain. A final decision algorithm selects a final hypothesis from the set of parallel recognizers.(cont.) Part of our contribution is the development of a novel first stage which attempts to maximize linguistic constraints, using only below word-level information. The goals are to prevent sequences of unknown words from being pruned away prematurely while maintaining performance on in-vocabulary items, as well as reducing the search space for later stages. Our solution coordinates the application of various subword level knowledge sources. The recognizer lexicon is implemented with an inventory of linguistically motivated units called morphs, which are syllables augmented with spelling and word position. This first stage is designed to output a phonetic network so that we are not committed to the initial hypotheses. This adds robustness, as later stages can propose words directly from phones. To maximize performance on the first stage, much of our focus has centered on the integration of a set of hierarchical sublexical models into this first pass. To do this, we utilize the ANGIE framework which supports a trainable context-free grammar, and is designed to acquire subword-level and phonological information statistically. Its models can generalize knowledge about word structure, learned from in-vocabulary data, to previously unseen words. We explore methods for collapsing the ANGIE models into a finite-state transducer (FST) representation which enables these complex models to be efficiently integrated into recognition. The ANGIE-FST needs to encapsulate the hierarchical knowledge of ANGIE and replicate ANGIE's ability to support previously unobserved phonetic sequences ...by Grace Chung.Ph.D
Arabic goal-oriented conversational agents using semantic similarity techniques
Conversational agents (CAs) are computer programs used to interact with humans in conversation. Goal-Oriented Conversational agents (GO-CAs) are programs that interact with humans to serve a specific domain of interest; its’ importance has increased recently and covered fields of technology, sciences and marketing. There are several types of CAs used in the industry, some of them are simple with limited usage, others are sophisticated. Generally, most CAs were to serve the English language speakers, a few were built for the Arabic language, this is due to the complexity of the Arabic language, lack of researchers in both linguistic and computing. This thesis covered two types of GO-CAs. The first is the traditional pattern matching goal oriented CA (PMGO-CA), and the other is the semantic goal oriented CA (SGO-CA).
Pattern matching conversational agents (PMGO-CA) techniques are widely used in industry due to their flexibility and high performance. However, they are labour intensive, difficult to maintain or update, and need continuous housekeeping to manage users’ utterances (especially when instructions or knowledge changes). In addition to that they lack for any machine intelligence.
Semantic conversational agents (SGO-CA) techniques utilises humanly constructed knowledge bases such as WordNet to measure word and sentence similarity. Such measurement witnessed many researches for the English language, and very little for the Arabic language.
In this thesis, the researcher developed a novelty of a new methodology for the Arabic conversational agents (using both Pattern Matching and Semantic CAs), starting from scripting, knowledge engineering, architecture, implementation and evaluation. New tools to measure the word and sentence similarity were also constructed. To test performance of those CAs, a domain representing the Iraqi passport services was built. Both CAs were evaluated and tested by domain experts using special evaluation metrics. The evaluation showed very promising results, and the viability of the system for real life
Formal Linguistic Models and Knowledge Processing. A Structuralist Approach to Rule-Based Ontology Learning and Population
2013 - 2014The main aim of this research is to propose a structuralist approach for knowledge processing by means of ontology learning and population, achieved starting from unstructured and structured texts. The method suggested includes distributional semantic approaches and NL formalization theories, in order to develop a framework, which relies upon deep linguistic analysis... [edited by author]XIII n.s