Search CORE

76 research outputs found

Speech Recognition for Agglutinative Languages

Author: Thangarajan R.
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

Author: Leung Cheung Chi
Ma Bin
Ni Chongjia
Sivadas Sunil
Tong Rong
Wang Lei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/10/2022
Field of study

This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.Comment: Published by the 2017 IEEE International Conference on Orange Technologies (ICOT 2017

arXiv.org e-Print Archive

Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

Author: Vu Ngoc Thang
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

KITopen

On Developing an Automatic Speech Recognition System for Commonly used English Words in Indian English

Author: Ms. Jasleen Kaur, Prof. Puneet Mittal
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2017
Field of study

Speech is one of the easiest and the fastest way to communicate. Recognition of speech by computer for various languages is a challenging task. The accuracy of Automatic speech recognition system (ASR) remains one of the key challenges, even after years of research. Accuracy varies due to speaker and language variability, vocabulary size and noise. Also, due to the design of speech recognition that is based on issues like- speech database, feature extraction techniques and performance evaluation. This paper aims to describe the development of a speaker-independent isolated automatic speech recognition system for Indian English language. The acoustic model is build using Carnegie Mellon University (CMU) Sphinx tools. The corpus used is based on Most Commonly used English words in everyday life. Speech database includes the recordings of 76 Punjabi Speakers (north-west Indian English accent). After testing, the system obtained an accuracy of 85.20 %, when trained using 128 GMMs (Gaussian Mixture Models)

International Journal on Recent and Innovation Trends in Computing and Communication

Tamil Speech Recognition using Semi Continuous Models

Author: Hanitha Gnanathesigar
Publication venue
Publication date
Field of study

Abstract- In this paper novel approach for implementing Tamil Language Semi continuous speech recognition based on Hidden Markov Models is discussed. Tamil and other Indian languages share phonological features which are rich in vowel and consonant realizations. The same phone in different words has different realizations. This can be overcome by employing phone-in-context. Therefore triphone models were chosen as suitable sub-word units for acoustic training. The system is trained with speech corpus of 37 Tamil phones. Speech corpus consisted of 0.35 hours of speech. Training was done using Carnegie Mellon University (CMU)’s SphinxTrain acoustic model Trainer. Accuracy of the training is measured by decoding using PocketSphinx

CiteSeerX

Comparative Analysis of Arabic Vowels using Formants and an Automatic Speech Recognition System

Author: Alotaibi Yousef Ajami
Hussain Amir
Publication venue: 'Science and Engineering Research Support Society'
Publication date: 01/06/2010
Field of study

Arabic, the world's second most spoken language in terms of number of speakers, has not received much attention from the traditional speech processing research community. This study is specifically concerned with the analysis of vowels in modern standard Arabic dialect. The first and second formant values in these vowels are investigated and the differences and similarities between the vowels explored using consonant-vowels-consonant (CVC) utterances. For this purpose, a Hidden Markov Model (HMM) based recognizer is built to classify the vowels and the performance of the recognizer analyzed to help understand the similarities and dissimilarities between the phonetic features of vowels. The vowels are also analyzed in both time and frequency domains, and the consistent findings of the analysis are expected to enable future Arabic speech processing tasks such as vowel and speech recognition and classification

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Author: A Abad
A Cardenal-Lopez
A Cardenal-López
A Jansen
A Jansen
A Martin
A Moreno
A Moreno
A Moreno-Sandoval
A Stolcke
Alejandro Coucheiro-Limeres
AM Azmi
Antonio Cardenal
Antonio Miguel
B Logan
B Logan
B Ma
B Taras
B Zhang
C Ni
C Parada
Carmen Garcia-Mateo
CJ Chen
D Can
D Karakos
D Povey
D Vergyri
D Vergyri
Doroteo T. Toledano
F Metze
F Metze
GJF Jones
H Joho
H Joho
H Su
H-Y Lee
H-Y Lee
HVD Heuvel
I Szöke
I Szöke
I-F Chen
I-F Chen
J Chiu
J Chiu
J Chiu
J Garofolo
J Li
J Mamou
J Mamou
J Pinto
J Tejedor
J Tejedor
J Trmal
J van Hout
Javier Tejedor
JG Fiscus
Julia Olcoz
Julian David Echeverry-Correa
K Iwata
K Thambiratmann
KM Knill
KM Knill
L Docío-Fernández
L Mangu
Laura Docio-Fernandez
LJ Rodríguez-Fuentes
M Bisani
M Cai
M Ma
M Saraclar
M Wollmer
M Zelenák
MJF Gales
MS Seigel
N Rajput
NF Chen
NF Chen
P Yu
Paula Lopez-Otero
R Justo
S Nakagawa
SP Rath
T Ng
T Ohno
T Sakai
V Mitra
V-B Le
X Anguera
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Repositorio Universidad de Zaragoza

Biblos-e Archivo

A Review of Accent-Based Automatic Speech Recognition Models for E-Learning Environment

Author: Omojokun Gabriel Aju
Veronica Ijebusomma Osubor
Publication venue: Covenant University, Ota, Nigeria
Publication date: 16/12/2022
Field of study

The adoption of electronics learning (e-learning) as a method of disseminating knowledge in the global educational system is growing at a rapid rate, and has created a shift in the knowledge acquisition methods from the conventional classrooms and tutors to the distributed e-learning technique that enables access to various learning resources much more conveniently and flexibly. However, notwithstanding the adaptive advantages of learner-centric contents of e-learning programmes, the distributed e-learning environment has unconsciously adopted few international languages as the languages of communication among the participants despite the various accents (mother language influence) among these participants. Adjusting to and accommodating these various accents has brought about the introduction of accents-based automatic speech recognition into the e-learning to resolve the effects of the accent differences. This paper reviews over 50 research papers to determine the development so far made in the design and implementation of accents-based automatic recognition models for the purpose of e-learning between year 2001 and 2021. The analysis of the review shows that 50% of the models reviewed adopted English language, 46.50% adopted the major Chinese and Indian languages and 3.50% adopted Swedish language as the mode of communication. It is therefore discovered that majority of the ASR models are centred on the European, American and Asian accents, while unconsciously excluding the various accents peculiarities associated with the less technologically resourced continents

Covenant Journals (Covenant University)

Computer-based stuttered speech detection system using Hidden Markov Model

Author: Chin Wee Lip
Publication venue
Publication date: 01/08/2012
Field of study

Stuttering has attracted extensive research interests over the past decades. Most of the available stuttering diagnostics and assessment technique uses human perceptual judgment to overt stuttered speech characteristics. Conventionally, the stuttering severity is diagnosed by manual counting the number of occurrences of disfluencies of pre-recorded therapist-patient conversation. It is a time-consuming task, subjective, inconsistent and easily prone to error across clinics. Therefore, this thesis proposes a computerized system by deploying HMM-based speech recognition technique to detect the stuttered speech disfluency. The continuous Malay digit string has been used as the training and testing set for fluency detection. Hidden Markov Model (HMM) is a robust and powerful statistical-based acoustic modeling technique. With their efficient training algorithm (Forward-backward, Baum-Welch algorithms) and recognition algorithm, as well as its modeling flexibility in model topology and other knowledge sources, HMM has been successfully applied in solving various tasks. In this thesis, a set of normal voice for digit string as database is used for training HMM. Then, the pseudo stuttering voice was collected as testing set for proposed system. The generated experimental results were compared with the results made by Speech Language Pathologist (SLP) from Clinic of Audiology and Speech Sciences of Universiti Kebangsaan Malaysia (UKM). As a result, the proposed system is proven to be capable to achieve 100% average syllable repetition detection accuracy with 86.605% average sound prolongation detection accuracy. The SLP agreed with the result generated by the software. This system can be further enhanced for detecting stuttering disorder for daily speaking words where Microsoft Visual C++ 6.0 and Goldwave have been used for developing the software which can be executed under the window-based environment

Universiti Teknologi Malaysia Institutional Repository

AUTOMATIC EXTRACTION OF ARABIC SUBWORD UNITS FOR CONTINUOUS SPEECH RECOGNITION

Author
Publication venue
Publication date
Field of study

KFUPM ePrints