Search CORE

221 research outputs found

ACOUSTIC-PHONETIC FEATURE BASED DIALECT IDENTIFICATION IN HINDI SPEECH

Author
Publication venue: 'Exeley, Inc.'
Publication date: 01/01/2015
Field of study

Crossref

ACOUSTIC-PHONETIC FEATURE BASED DIALECT IDENTIFICATION IN HINDI SPEECH

Author: Agrawal S. S.
Jain Aruna
Sinha Shweta
Publication venue: 'Exeley, Inc.'
Publication date
Field of study

Exeley Inc.

A review of Yorùbá Automatic Speech Recognition

Author: Atanda Abdulwahab F.
Hariharan M.
Mohd Yusof Shahrul Azmi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Automatic Speech Recognition (ASR) has recorded appreciable progress both in technology and application.Despite this progress, there still exist wide performance gap between human speech recognition (HSR) and ASR which has inhibited its full adoption in real life situation.A brief review of research progress on Yorùbá Automatic Speech Recognition (ASR) is presented in this paper focusing of variability as factor contributing to performance gap between HSR and ASR with a view of x-raying the advances recorded, major obstacles, and chart a way forward for development of ASR for Yorùbá that is comparable to those of other tone languages and of developed nations.This is done through extensive surveys of literatures on ASR with focus on Yorùbá.Though appreciable progress has been recorded in advancement of ASR in the developed world, reverse is the case for most of the developing nations especially those of Africa.Yorùbá like most of languages in Africa lacks both human and materials resources needed for the development of functional ASR system much less taking advantage of its potentials benefits. Results reveal that attaining an ultimate goal of ASR performance comparable to human level requires deep understanding of variability factors

UUM Repository

Crossref

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

Author: Tits Noé
Publication venue
Publication date: 17/10/2023
Field of study

In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification (in text and phonetic domains). The system was built with open-source components and resources. Through an ablation study, we demonstrate the efficacy of our approach in automatically syllabifying words from several languages (English, French and Spanish). Additionally, we apply the technique to the transcriptions of the CMU ARCTIC dataset, generating valuable annotations available online\footnote{\url{https://github.com/noetits/MUST_P-SRL}} that are ideal for speech representation learning, speech unit discovery, and disentanglement of speech factors in several speech-related fields.Comment: Accepted for publication at EMNLP 202

arXiv.org e-Print Archive

SMaTTS: standard malay text to speech system

Author: Ahmad Zakiah Hanim
Gunawan Teddy Surya
Khalifa Othman Omran
Publication venue: 'International Research Publication House'
Publication date: 01/01/2007
Field of study

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed

CiteSeerX

The International Islamic University Malaysia Repository

Emotions and Strategies for Preparation of Emotional Speech Database

Author: Lecturer Meghana Nagori
Lecturer V P Kshirsagar
Sarita T Sawale
Publication venue
Publication date: 23/04/2020
Field of study

Abstract The exploration of how we as human beings react to the world and interact with it and each other remains one of the greatest challenges. The ability to recognize emotional states of a person perhaps the most important for successful inter personal social interaction. Automatic emotional speech recognition system can be characterized by the used features, the investigated emotional categories, the methods to collect speech utterances, the languages and the type of the classifier used in the experiment. Since a well defined database is the necessary precondition for improving the performance Automatic emotional speech recognition systems. This paper explores the theories that explain the social and cognitive roles of emotions and mental states and their expression in human behaviors and communication. The paper describes the planning and accomplishment of a native language emotional speech database of acted emotional speech by number of speakers, recording strategies, conversion etc as well as the alternative approach is briefly addressed. Such database would also contribute to research in intonation and emotion

CiteSeerX