Search CORE

430 research outputs found

From Fuzzy Expert System to Artificial Neural Network: Application to Assisted Speech Therapy

Author: Chiuchisan Iuliana
Covasa Mihai
Geman Oana
Schipor Ovidiu
Publication venue: 'IntechOpen'
Publication date: 19/10/2016
Field of study

This chapter addresses the following question: What are the advantages of extending a fuzzy expert system (FES) to an artificial neural network (ANN), within a computer‐based speech therapy system (CBST)? We briefly describe the key concepts underlying the principles behind the FES and ANN and their applications in assisted speech therapy. We explain the importance of an intelligent system in order to design an appropriate model for real‐life situations. We present data from 1‐year application of these concepts in the field of assisted speech therapy. Using an artificial intelligent system for improving speech would allow designing a training program for pronunciation, which can be individualized based on specialty needs, previous experiences, and the child\u27s prior therapeutical progress. Neural networks add a great plus value when dealing with data that do not normally match our previous designed pattern. Using an integrated approach that combines FES and ANN allows our system to accomplish three main objectives: (1) develop a personalized therapy program; (2) gradually replace some human expert duties; (3) use “self‐learning” capabilities, a component traditionally reserved for humans. The results demonstrate the viability of the hybrid approach in the context of speech therapy that can be extended when designing similar applications

IntechOpen

How Japanese Learners Learn to Produce Authentic English Vowels

Author: Kitagawa Aya
Publication venue: 早稲田大学大学院教育学研究科
Publication date: 30/09/2012
Field of study

Waseda University Repository

Computational and Numerical Simulations

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Computational and Numerical Simulations is an edited book including 20 chapters. Book handles the recent research devoted to numerical simulations of physical and engineering systems. It presents both new theories and their applications, showing bridge between theoretical investigations and possibility to apply them by engineers of different branches of science. Numerical simulations play a key role in both theoretical and application oriented research

Directory of Open Access Books (DOAB)

Rhythmic unit extraction and modelling for automatic language identification

Author: André-Obrecht Régine
Farinas Jérôme
Pellegrino François
Rouas Jean-Luc
Publication venue: Elsevier : North-Holland
Publication date: 01/01/2005
Field of study

International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

HAL

Hal-Diderot

Modelo acústico de língua inglesa falada por portugueses

Author: Simões Carla Alexandra Coelho
Publication venue
Publication date: 01/01/2007
Field of study

Trabalho de projecto de mestrado em Engenharia Informática, apresentado à Universidade de Lisboa, através da Faculdade de Ciências, 2007No contexto do reconhecimento robusto de fala baseado em modelos de Markov não observáveis (do inglês Hidden Markov Models - HMMs) este trabalho descreve algumas metodologias e experiências tendo em vista o reconhecimento de oradores estrangeiros. Quando falamos em Reconhecimento de Fala falamos obrigatoriamente em Modelos Acústicos também. Os modelos acústicos reflectem a maneira como pronunciamos/articulamos uma língua, modelando a sequência de sons emitidos aquando da fala. Essa modelação assenta em segmentos de fala mínimos, os fones, para os quais existe um conjunto de símbolos/alfabetos que representam a sua pronunciação. É no campo da fonética articulatória e acústica que se estuda a representação desses símbolos, sua articulação e pronunciação. Conseguimos descrever palavras analisando as unidades que as constituem, os fones. Um reconhecedor de fala interpreta o sinal de entrada, a fala, como uma sequência de símbolos codificados. Para isso, o sinal é fragmentado em observações de sensivelmente 10 milissegundos cada, reduzindo assim o factor de análise ao intervalo de tempo onde as características de um segmento de som não variam. Os modelos acústicos dão-nos uma noção sobre a probabilidade de uma determinada observação corresponder a uma determinada entidade. É, portanto, através de modelos sobre as entidades do vocabulário a reconhecer que é possível voltar a juntar esses fragmentos de som. Os modelos desenvolvidos neste trabalho são baseados em HMMs. Chamam-se assim por se fundamentarem nas cadeias de Markov (1856 - 1922): sequências de estados onde cada estado é condicionado pelo seu anterior. Localizando esta abordagem no nosso domínio, há que construir um conjunto de modelos - um para cada classe de sons a reconhecer - que serão treinados por dados de treino. Os dados são ficheiros áudio e respectivas transcrições (ao nível da palavra) de modo a que seja possível decompor essa transcrição em fones e alinhá-la a cada som do ficheiro áudio correspondente. Usando um modelo de estados, onde cada estado representa uma observação ou segmento de fala descrita, os dados vão-se reagrupando de maneira a criar modelos estatísticos, cada vez mais fidedignos, que consistam em representações das entidades da fala de uma determinada língua. O reconhecimento por parte de oradores estrangeiros com pronuncias diferentes da língua para qual o reconhecedor foi concebido, pode ser um grande problema para precisão de um reconhecedor. Esta variação pode ser ainda mais problemática que a variação dialectal de uma determinada língua, isto porque depende do conhecimento que cada orador têm relativamente à língua estrangeira. Usando para uma pequena quantidade áudio de oradores estrangeiros para o treino de novos modelos acústicos, foram efectuadas diversas experiências usando corpora de Portugueses a falar Inglês, de Português Europeu e de Inglês. Inicialmente foi explorado o comportamento, separadamente, dos modelos de Ingleses nativos e Portugueses nativos, quando testados com os corpora de teste (teste com nativos e teste com não nativos). De seguida foi treinado um outro modelo usando em simultâneo como corpus de treino, o áudio de Portugueses a falar Inglês e o de Ingleses nativos. Uma outra experiência levada a cabo teve em conta o uso de técnicas de adaptação, tal como a técnica MLLR, do inglês Maximum Likelihood Linear Regression. Esta última permite a adaptação de uma determinada característica do orador, neste caso o sotaque estrangeiro, a um determinado modelo inicial. Com uma pequena quantidade de dados representando a característica que se quer modelar, esta técnica calcula um conjunto de transformações que serão aplicadas ao modelo que se quer adaptar. Foi também explorado o campo da modelação fonética onde estudou-se como é que o orador estrangeiro pronuncia a língua estrangeira, neste caso um Português a falar Inglês. Este estudo foi feito com a ajuda de um linguista, o qual definiu um conjunto de fones, resultado do mapeamento do inventário de fones do Inglês para o Português, que representam o Inglês falado por Portugueses de um determinado grupo de prestígio. Dada a grande variabilidade de pronúncias teve de se definir este grupo tendo em conta o nível de literacia dos oradores. Este estudo foi posteriormente usado na criação de um novo modelo treinado com os corpora de Portugueses a falar Inglês e de Portugueses nativos. Desta forma representamos um reconhecedor de Português nativo onde o reconhecimento de termos ingleses é possível. Tendo em conta a temática do reconhecimento de fala este projecto focou também a recolha de corpora para português europeu e a compilação de um léxico de Português europeu. Na área de aquisição de corpora o autor esteve envolvido na extracção e preparação dos dados de fala telefónica, para posterior treino de novos modelos acústicos de português europeu. Para compilação do léxico de português europeu usou-se um método incremental semi-automático. Este método consistiu em gerar automaticamente a pronunciação de grupos de 10 mil palavras, sendo cada grupo revisto e corrigido por um linguista. Cada grupo de palavras revistas era posteriormente usado para melhorar as regras de geração automática de pronunciações.The tremendous growth of technology has increased the need of integration of spoken language technologies into our daily applications, providing an easy and natural access to information. These applications are of different nature with different user’s interfaces. Besides voice enabled Internet portals or tourist information systems, automatic speech recognition systems can be used in home user’s experiences where TV and other appliances could be voice controlled, discarding keyboards or mouse interfaces, or in mobile phones and palm-sized computers for a hands-free and eyes-free manipulation. The development of these systems causes several known difficulties. One of them concerns the recognizer accuracy on dealing with non-native speakers with different phonetic pronunciations of a given language. The non-native accent can be more problematic than a dialect variation on the language. This mismatch depends on the individual speaking proficiency and speaker’s mother tongue. Consequently, when the speaker’s native language is not the same as the one that was used to train the recognizer, there is a considerable loss in recognition performance. In this thesis, we examine the problem of non-native speech in a speaker-independent and large-vocabulary recognizer in which a small amount of non-native data was used for training. Several experiments were performed using Hidden Markov models, trained with speech corpora containing European Portuguese native speakers, English native speakers and English spoken by European Portuguese native speakers. Initially it was explored the behaviour of an English native model and non-native English speakers’ model. Then using different corpus weights for the English native speakers and English spoken by Portuguese speakers it was trained a model as a pool of accents. Through adaptation techniques it was used the Maximum Likelihood Linear Regression method. It was also explored how European Portuguese speakers pronounce English language studying the correspondences between the phone sets of the foreign and target languages. The result was a new phone set, consequence of the mapping between the English and the Portuguese phone sets. Then a new model was trained with English Spoken by Portuguese speakers’ data and Portuguese native data. Concerning the speech recognition subject this work has other two purposes: collecting Portuguese corpora and supporting the compilation of a Portuguese lexicon, adopting some methods and algorithms to generate automatic phonetic pronunciations. The collected corpora was processed in order to train acoustic models to be used in the Exchange 2007 domain, namely in Outlook Voice Access

Universidade de Lisboa: Repositório.UL

Impact of dialect use on a basic component of learning to read

Author: Daragh E. Sibley
Jan R. Edwards
Julie A. Washington
Mark S. Seidenberg
Maryellen C. MacDonald
Megan C. Brown
Timothy T. Rogers
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

Can some black-white differences in reading achievement be traced to differences in language background? Many African American children speak a dialect that differs from the mainstream dialect emphasized in school. We examined how use of alternative dialects affects decoding, an important component of early reading and marker of reading development. Behavioral data show that use of the alternative pronunciations of words in different dialects affects reading aloud in developing readers, with larger effects for children who use more African American English. Mechanisms underlying this effect were explored with a computational model, investigating factors affecting reading acquisition. The results indicate that the achievement gap may be due in part to differences in task complexity: children whose home and school dialects differ are at greater risk for reading difficulties because tasks such as learning to decode are more complex for them

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

Author: Li Zhifeng
Su Guinan
Yang Yanwu
Publication venue
Publication date: 12/11/2023
Field of study

In recent years, audio-driven 3D facial animation has gained significant attention, particularly in applications such as virtual reality, gaming, and video conferencing. However, accurately modeling the intricate and subtle dynamics of facial expressions remains a challenge. Most existing studies approach the facial animation task as a single regression problem, which often fail to capture the intrinsic inter-modal relationship between speech signals and 3D facial animation and overlook their inherent consistency. Moreover, due to the limited availability of 3D-audio-visual datasets, approaches learning with small-size samples have poor generalizability that decreases the performance. To address these issues, in this study, we propose a cross-modal dual-learning framework, termed DualTalker, aiming at improving data usage efficiency as well as relating cross-modal dependencies. The framework is trained jointly with the primary task (audio-driven facial animation) and its dual task (lip reading) and shares common audio/motion encoder components. Our joint training framework facilitates more efficient data usage by leveraging information from both tasks and explicitly capitalizing on the complementary relationship between facial motion and audio to improve performance. Furthermore, we introduce an auxiliary cross-modal consistency loss to mitigate the potential over-smoothing underlying the cross-modal complementary representations, enhancing the mapping of subtle facial expression dynamics. Through extensive experiments and a perceptual user study conducted on the VOCA and BIWI datasets, we demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. We have made our code and video demonstrations available at https://github.com/sabrina-su/iadf.git

arXiv.org e-Print Archive

The Impact of AI on Teaching and Learning in Higher Education Technology

Author: Hiran Kamal Kant
Singh Satya Vir
Publication venue: North American Business Press
Publication date: 19/10/2022
Field of study

Thanks to AI, students may now study whenever and wherever they like. Personalized feedback on assignments, quizzes, and other assessments can be generated using AI algorithms and utilised as a teaching tool to help students succeed. This study examined the impact of artificial intelligence in higher education teaching and learning. This study focuses on the impact of new technologies on student learning and educational institutions. With the rapid adoption of new technologies in higher education, as well as recent technological advancements, it is possible to forecast the future of higher education in a world where artificial intelligence is ubiquitous. Administration, student support, teaching, and learning can all benefit from the use of these technologies; we identify some challenges that higher education institutions and students may face, and we consider potential research directions

Article Gateway

Integrating Language Identification to improve Multilingual Speech Recognition

Author: Caesar Holger
Publication venue: Idiap
Publication date: 19/12/2013
Field of study

The process of determining the language of a speech utterance is called Language Identification (LID). This task can be very challenging as it has to take into account various language-specific aspects, such as phonetic, phonotactic, vocabulary and grammar-related cues. In multilingual speech recognition we try to find the most likely word sequence that corresponds to an utterance where the language is not known a priori. This is a considerably harder task compared to monolingual speech recognition and it is common to use LID to estimate the current language. In this project we present two general approaches for LID and describe how to integrate them into multilingual speech recognizers. The first approach uses hierarchical multilayer perceptrons to estimate language posterior probabilities given the acoustics in combination with hidden Markov models. The second approach evaluates the output of a multilingual speech recognizer to determine the spoken language. The research is applied to the MediaParl speech corpus that was recorded at the Parliament of the canton of Valais, where people switch from Swiss French to Swiss German or vice versa. Our experiments show that, on that particular data set, LID can be used to significantly improve the performance of multilingual speech recognizers. We will also point out that ASR dependent LID approaches yield the best performance due to higher-level cues and that our systems perform much worse on non-native dat

Infoscience - École polytechnique fédérale de Lausanne

Design of reservoir computing systems for the recognition of noise corrupted speech and handwriting

Author: Jalalvand Azarakhsh
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography