Search CORE

23 research outputs found

Voice Activity Detection and Garbage Modelling for a Mobile Automatic Speech Recognition Application

Author: Ishaq Muhammad
Publication venue
Publication date: 23/01/2017
Field of study

Recently, state-of-the-art automatic speech recognition systems are used in various industries all over the world. Most of them are using a customized version of speech recognition system. The need for different versions arise due to different speech commands, lexicon, language and distinct work environment. It is essential for a speech recognizer to provide accurate and precise outputs in every working environment. However, the performance of a speech recognizer degrades quickly when noise intermingles with a work environment and also when out-of-vocabulary (OOV) words are spoken to the speech recognizer. This thesis consists of three different tasks which improve an automatic speech recognition application for mobile devices. The three tasks include building of a new acoustic model, improving the current voice activity detection and garbage modelling of OOV words. In this thesis, firstly, a Finnish acoustic model is trained for a company called Devoca Oy. The training data was recorded from different warehouse environments to improve the real-world speech recognition accuracy. Secondly, the Gammatone and Gabor features are extracted from the input speech frame to improve the voice activity detection (VAD). These features are applied to the VAD decision module of Pocketsphinx and a new neural-network classifier, to be classified as speech or non-speech. Lastly, a garbage model is developed for the OOV words. This model recognizes the words from outside the grammar and marks them as unknown on the application interface. This thesis evaluates the success of these three tasks with Finnish audio database and reports the overall improvement in the word error rate

Aaltodoc Publication Archive

Large vocabulary speech recognition in noisy environments

Author: Jabloun Firas
Publication venue: Bilkent University
Publication date: 01/01/1998
Field of study

Ankara : Department of Electrical and Electronics Engineering and Institute of Engineering and Sciences, Bilkent Univ., 1998.Thesis (Master's) -- Bilkent University, 1998.Includes bibliographical references leaves 48-52A ІКПѴ set of speech feature parameters based on multirate subband analysis and the Teager Energy Operator (TEO) is developed. The speech signal is first divided into nonuniform subbands in mel-scale using a multirate filter-bank, then the Teager energies of the subsignals are estimated. Finally, the feature vector is constructed by logcompression and inverse DOT computation. The new feature parameters (TEOCEP) have a robust speech recognition performance in car engine noise which has a low pass nature. In this thesis, we also present some solutions to the problem of large vocabulary speech recognition. Triphone-based Hidden Markov. Models (HMM) are used to model the vocabulary words. Although the straight forward parallel search strategy gives good recognition performance, the processing time required is found to be long and impractical. Therefore another search strategy with similar performance is described. Subvocabularies are developed during the training session to reduce the total number of words considered in the search process. The search is then performed in a tree structure by investigating one subvocabulary instead of all the words.Jabloun, FirasM.S

Bilkent University Institutional Repository

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Traitement paramétrique des signaux audio dans le contexte des prothèses auditives

Author: Trabelsi Abdelaziz
Publication venue
Publication date: 01/01/2008
Field of study

Modèle à moyenne mobile > -- Modèle autorégressif > -- Modèle autorégressif à moyenne mobile > -- Remarque sur le lien entre AR, MA et ARMA -- Evaluation des paramètres d'un processus AR(p) -- Critères de sélection de l'ordre d'un modèle AR(p) -- Notion d'enveloppe spectrale -- Méthodes élaborées dans le domaine fréquentiel -- Méthodes élaborées dans le domaine de corrélation -- Réduction de bruit dans le domaine fréquentiel -- A two-microphone algorithm for speech enhancement -- State of the art -- Zelinski's approach in the case of two-microphone arrangement -- Two-microphone speech enhancement system -- Performance evaluation and results -- Réduction de bruit dans le domaine de corrélation -- Estimation de la puissance du bruit -- Compensation des effets du bruit -- Amélioration de la procédure de compensation -- Perspectives de développement -- Traitement paramétrique en présence de bruit -- Disposition du traitement combiné -- Amélioration de la précision de l'estimateur de variance du bruit

PolyPublie

Recommended from our members

Speech reconstruction from articulatory movement for laryngectomees

Author: Cao Beiming
Publication venue
Publication date: 12/07/2024
Field of study

Laryngectomees are individuals who have their larynx surgically removed due to the treatment of laryngeal cancer. They lost their ability to vocalize speech but can still articulate after the surgery. As a result, they rely on alternative methods for communication (e.g., alaryngeal speech). However, alaryngeal speech generates unnatural-sounding voice, which discourages their willingness to speak and causes social isolation and even depression. Silent speech interfaces (SSIs) convert non-audio human bio-signals (e.g., tongue and lip movement) to speech, which have the potential to reconstruct speech with natural-sounding voice and even speaker identity. Although the concept of SSI and its feasibility has been demonstrated in the field, SSI development still faces a few major challenges, including small data size, lack of algorithms for laryngectomees, and lack of wearable devices for daily use. A series of studies were conducted to address these challenges. This dissertation contributed to the field from different aspects, including novel algorithms and approaches for articulation-to-speech mapping, new knowledge to improve the design of SSI, and evaluation of newly developed wearable devices.Electrical and Computer Engineerin

Texas ScholarWorks

Recent Applications in Graph Theory

Author
Publication venue: 'IntechOpen'
Publication date: 27/07/2022
Field of study

Graph theory, being a rigorously investigated field of combinatorial mathematics, is adopted by a wide variety of disciplines addressing a plethora of real-world applications. Advances in graph algorithms and software implementations have made graph theory accessible to a larger community of interest. Ever-increasing interest in machine learning and model deployments for network data demands a coherent selection of topics rewarding a fresh, up-to-date summary of the theory and fruitful applications to probe further. This volume is a small yet unique contribution to graph theory applications and modeling with graphs. The subjects discussed include information hiding using graphs, dynamic graph-based systems to model and control cyber-physical systems, graph reconstruction, average distance neighborhood graphs, and pure and mixed-integer linear programming formulations to cluster networks

Directory of Open Access Books (DOAB)

Brain-Computer Interface

Author
Publication venue: 'IntechOpen'
Publication date: 27/07/2022
Field of study

Brain-computer interfacing (BCI) with the use of advanced artificial intelligence identification is a rapidly growing new technology that allows a silently commanding brain to manipulate devices ranging from smartphones to advanced articulated robotic arms when physical control is not possible. BCI can be viewed as a collaboration between the brain and a device via the direct passage of electrical signals from neurons to an external system. The book provides a comprehensive summary of conventional and novel methods for processing brain signals. The chapters cover a range of topics including noninvasive and invasive signal acquisition, signal processing methods, deep learning approaches, and implementation of BCI in experimental problems

Directory of Open Access Books (DOAB)

Using auxiliary sources of knowledge for automatic speech recognition

Author: Magimai Doss Mathew
Publication venue: Lausanne, EPFL
Publication date: 01/01/2005
Field of study

Standard hidden Markov model (HMM) based automatic speech recognition (ASR) systems usually use cepstral features as acoustic observation and phonemes as subword units. Speech signal exhibits wide range of variability such as, due to environmental variation, speaker variation. This leads to different kinds of mismatch, such as, mismatch between acoustic features and acoustic models or mismatch between acoustic features and pronunciation models (given the acoustic models). The main focus of this work is on integrating auxiliary knowledge sources into standard ASR systems so as to make the acoustic models more robust to the variabilities in the speech signal. We refer to the sources of knowledge that are able to provide additional information about the sources of variability as auxiliary sources of knowledge. The auxiliary knowledge sources that have been primarily investigated in the present work are auxiliary features and auxiliary subword units. Auxiliary features are secondary source of information that are outside of the standard cepstral features. They can be estimation from the speech signal (e.g., pitch frequency, short-term energy and rate-of-speech), or additional measurements (e.g., articulator positions or visual information). They are correlated to the standard acoustic features, and thus can aid in estimating better acoustic models, which would be more robust to variabilities present in the speech signal. The auxiliary features that have been investigated are pitch frequency, short-term energy and rate-of-speech. These features can be modelled in standard ASR either by concatenating them to the standard acoustic feature vectors or by using them to condition the emission distribution (as done in gender-based acoustic modelling). We have studied these two approaches within the framework of hybrid HMM/artificial neural networks based ASR, dynamic Bayesian network based ASR and TANDEM system on different ASR tasks. Our studies show that by modelling auxiliary features along with standard acoustic features the performance of the ASR system can be improved in both clean and noisy conditions. We have also proposed an approach to evaluate the adequacy of the baseform pronunciation model of words. This approach allows us to compare between different acoustic models as well as to extract pronunciation variants. Through the proposed approach to evaluate baseform pronunciation model, we show that the matching and discriminative properties of single baseform pronunciation can be improved by integrating auxiliary knowledge sources in standard ASR. Standard ASR systems use usually phonemes as the subword units in a Markov chain to model words. In the present thesis, we also study a system where word models are described by two parallel chains of subword units: one for phonemes and the other are for graphemes (phoneme-grapheme based ASR). Models for both types of subword units are jointly learned using maximum likelihood training. During recognition, decoding is performed using either or both of the subword unit chains. In doing so, we thus have used graphemes as auxiliary subword units. The main advantage of using graphemes is that the word models can be defined easily using the orthographic transcription, thus being relatively noise free as compared to word models based upon phoneme units. At the same time, there are drawbacks to using graphemes as subword units, since there is a weak correspondence between the grapheme and the phoneme in languages such as English. Experimental studies conducted for American English on different ASR tasks have shown that the proposed phoneme-grapheme based ASR system can perform better than the standard ASR system that uses only phonemes as its subword units. Furthermore, while modelling context-dependent graphemes (similar to context-dependent phonemes), we observed that context-dependent graphemes behave like phonemes. ASR studies conducted on different tasks showed that by modelling context-dependent graphemes only (without any phonetic information) performance competitive to the state-of-the-art context-dependent phoneme-based ASR system can be obtained

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Text mining and natural language processing for the early stages of space mission design

Author: Berquand Audrey
Publication venue
Publication date: 01/01/2021
Field of study

Final thesis submitted December 2021 - degree awarded in 2022A considerable amount of data related to space mission design has been accumulated since artificial satellites started to venture into space in the 1950s. This data has today become an overwhelming volume of information, triggering a significant knowledge reuse bottleneck at the early stages of space mission design. Meanwhile, virtual assistants, text mining and Natural Language Processing techniques have become pervasive to our daily life. The work presented in this thesis is one of the first attempts to bridge the gap between the worlds of space systems engineering and text mining. Several novel models are thus developed and implemented here, targeting the structuring of accumulated data through an ontology, but also tasks commonly performed by systems engineers such as requirement management and heritage analysis. A first collection of documents related to space systems is gathered for the training of these methods. Eventually, this work aims to pave the way towards the development of a Design Engineering Assistant (DEA) for the early stages of space mission design. It is also hoped that this work will actively contribute to the integration of text mining and Natural Language Processing methods in the field of space mission design, enhancing current design processes.A considerable amount of data related to space mission design has been accumulated since artificial satellites started to venture into space in the 1950s. This data has today become an overwhelming volume of information, triggering a significant knowledge reuse bottleneck at the early stages of space mission design. Meanwhile, virtual assistants, text mining and Natural Language Processing techniques have become pervasive to our daily life. The work presented in this thesis is one of the first attempts to bridge the gap between the worlds of space systems engineering and text mining. Several novel models are thus developed and implemented here, targeting the structuring of accumulated data through an ontology, but also tasks commonly performed by systems engineers such as requirement management and heritage analysis. A first collection of documents related to space systems is gathered for the training of these methods. Eventually, this work aims to pave the way towards the development of a Design Engineering Assistant (DEA) for the early stages of space mission design. It is also hoped that this work will actively contribute to the integration of text mining and Natural Language Processing methods in the field of space mission design, enhancing current design processes

STAX (Strathclyde Repository)

Space Communications: Theory and Applications. Volume 3: Information Processing and Advanced Techniques. A Bibliography, 1958 - 1963

Author: Bickford L. C.
Filipowsky R. F.
Publication venue
Publication date
Field of study

Annotated bibliography on information processing and advanced communication techniques - theory and applications of space communication

NASA Technical Reports Server