Search CORE

360 research outputs found

A hybrid neural network based speech recognition system for pervasive environments

Author: Dooley Laurence S.
Gondal Iqbal
Sehgal Shoaib M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

One of the major drawbacks to using speech as the input to any pervasive environment is the requirement to balance accuracy with the high processing overheads involved. This paper presents an Arabic speech recognition system (called UbiqRec), which address this issue by providing a natural and intuitive way of communicating within ubiquitous environments, while balancing processing time, memory and recognition accuracy. A hybrid approach has been used which incorporates spectrographic information, singular value decomposition, concurrent self-organizing maps (CSOM) and pitch contours for Arabic phoneme recognition. The approach employs separate self-organizing maps (SOM) for each Arabic phoneme joined in parallel to form a CSOM. The performance results confirm that with suitable preprocessing of data, including extraction of distinct power spectral densities (PSD) and singular value decomposition, the training time for CSOM was reduced by 89%. The empirical results also proved that overall recognition accuracy did not fall below 91%

CiteSeerX

Crossref

Open Research Online (The Open University)

AUTOMATIC EXTRACTION OF ARABIC SUBWORD UNITS FOR CONTINUOUS SPEECH RECOGNITION

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

AUTOMATIC EXTRACTION OF ARABIC SUBWORD UNITS FOR CONTINUOUS SPEECH RECOGNITION

Author
Publication venue
Publication date
Field of study

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Author: Al-Maadeed Somaya
Amira Abbes
Bensaali Faycal
Himeur Yassine
Kheddar Hamza
Publication venue
Publication date: 27/04/2023
Field of study

Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research

arXiv.org e-Print Archive

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Central Kurdish Automatic Speech Recognition using Deep Learning

Author: Abdulhady Abdullah
Hadi Veisi
Publication venue: University of Anbar
Publication date: 01/12/2022
Field of study

Automatic Speech Recognition (ASR) as an interesting field of speech processing, is nowadays utilized in real applications which are implemented using various techniques. Amongst them, the artificial neural network is the most popular one. Increasing the performance and making these systems robust to noise are among the current challenges. This paper addresses the development of an ASR system for the Central Kurdish language (CKB) using a transfer learning of Deep Neural Networks (DNN). The combination of Mel-Frequency Cepstral Coefficients (MFCCs) for extracting features of speech signals, Long Short-Term Memory (LSTM) with Connectionist Temporal Classification (CTC) output layer is used to create an Acoustic Model (AM) on the AsoSoft CKB speech dataset. Also, we have used the N-gram language model on the collected large text dataset which includes about 300 million tokens. The text corpus is also used to extract a dynamic lexicon model that contains over 2.5 million CKB words. The obtained results show that the use of a DNN improves the results compared to classical statistics modules. The proposed method achieves a 0.22%-word error rate by combining transfer learning and language model adaptation. This result is superior to the best-reported result for the CKB

Directory of Open Access Journals

Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

Author: Vu Ngoc Thang
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

KITopen

Understanding the phonetics of neutralisation: a variability-field account of vowel/zero alternations in a Hijazi dialect of Arabic

Author: Almihmadi M.M.
Publication venue: UCL (University College London)
Publication date: 28/03/2011
Field of study

This thesis throws new light on issues debated in the experimental literature on neutralisation. They concern the extent of phonetic merger (the completeness question) and the empirical validity of the phonetic effect (the genuineness question). Regarding the completeness question, I present acoustic and perceptual analyses of vowel/zero alternations in Bedouin Hijazi Arabic (BHA) that appear to result in neutralisation. The phonology of these alternations exemplifies two neutralisation scenarios bearing on the completeness question. Until now, these scenarios have been investigated separately within small-scale studies. Here I look more closely at both, testing hypotheses involving the acoustics-perception relation and the phonetics-phonology relation. I then discuss the genuineness question from an experimental and statistical perspective. Experimentally, I devise a paradigm that manipulates important variables claimed to influence the phonetics of neutralisation. Statistically, I reanalyse neutralisation data reported in the literature from Turkish and Polish. I apply different pre-analysis procedures which, I argue, can partly explain the mixed results in the literature. My inquiry into these issues leads me to challenge some of the discipline’s accepted standards for characterising the phonetics of neutralisation. My assessment draws on insights from different research fields including statistics, cognition, neurology, and psychophysics. I suggest alternative measures that are both cognitively and phonetically more plausible. I implement these within a new model of lexical representation and phonetic processing, the Variability Field Model (VFM). According to VFM, phonetic data are examined as jnd-based intervals rather than as single data points. This allows for a deeper understanding of phonetic variability. The model combines prototypical and episodic schemes and integrates linguistic, paralinguistic, and extra-linguistic effects. The thesis also offers a VFM-based analysis of a set of neutralisation data from BHA. In striving for a better understanding of the phonetics of neutralisation, the thesis raises important issues pertaining to the way we approach phonetic questions, generate and analyse data, and interpret and evaluate findings

UCL Discovery

‘Turkish/Kurdish’ youth in North London:ethnic identifications

Author: Baysal Hulya
Publication venue
Publication date: 01/12/2016
Field of study

King's Research Portal