Search CORE

11,067 research outputs found

PENGENALAN UCAPAN KATA AWAL PADA AYAT AL-QURAN MENGGUNAKAN DYNAMIC TIME WARPING

Author: Nurridayanti Patmala -
Publication venue
Publication date: 17/04/2018
Field of study

Speech recognition mempunyai cakupan implementasi yang sangat luas dalam berbagai bidang kehidupan saat ini, seperti dalam sistem keamanan, alat medis, hingga bidang pendidikan, tak terkecuali dalam bidang pembelajaran keagamaan. Teknologi speech recognition dapat dijadikan alternatif media penunjang dalam proses menghafal Al-Quran. Penelitian ini akan mengimplementasikan algoritma Dynamic Time Warping ke dalam sebuah sistem isolated speech recognition untuk mengenali kata awal pada ayat Al-Quran. Proses recognition mencakup feature extraction dan feature matching. Feature extraction dilakukan untuk mengambil fitur dari sinyal data suara menggunakan Mel Frequency Cepstral Coefficients, sedangkan feature matching dilakukan untuk mendapatkan hasil recognition berupa nilai cost fitur yang paling minimum menggunakan Dynamic Time Warping. Hasil eksperimen menggunakan 5 orang model dan 3 jenis template speech menunjukkan bahwa dynamic time warping dapat mengenali suara yang diucapkan oleh orang yang sama, sangat sensitif terhadap suara yang diucapkan oleh orang yang berbeda, dan orang yang sama memiliki kecenderungan untuk melafalkan ayat dengan variasi yang konsisten.---- Speech recognition has a very wide and large scope of implementation in today’s living, such as security system, medical tools, up to education sector, including religious education. Speech recognition technology can be a media alternative for supporting Al-Quran memorizing process. This research would like to implements Dynamic Time Warping algorithm into an isolated speech recognition system for recognizing first word of each ayah of Al-Quran. The recognition process including two main phase, feature extraction and feature matching. Feature extraction is taken for getting features from speech data using Mel Frequency Cepstral Coefficients, whereas feature matching aims for getting the most minimum feature cost recognition result using Dynamic Time Warping. Experiments were done using 5 speech models and 3 kind of templates, and give results that dynamic time warping could recognize voice spoken by same person, is sensitive to voice spoken by different person, and a same person has tendency to pronounce ayah of Al-Quran with a stable consistency

Repository UPI

Assessment of time frequency warping for use as a reference degradation for assessing synthetic speech

Author: Burrell M.D.
Publication venue
Publication date
Field of study

At present there is no standard assessment method for rating and comparing the quality of synthesized speech. This study assesses the suitability of Time Frequency Warping (TFW) modulation for use as a reference device for assessing synthesized speech. Time Frequency Warping modulation introduces timing errors into natural speech that produce perceptual errors similar to those found in synthetic speech. It is proposed that TFW modulation used in conjunction with a listening effort test would provide a standard assessment method for rating the quality of synthesized speech. This study identifies the most suitable TFW modulation variable parameter to be used for assessing synthetic speech and assess the results of several assessment tests that rate examples of synthesized speech in terms of the TFW variable parameter and listening effort. The study also attempts to identify the attributes of speech that differentiate synthetic, TFW modulated and natural speech

Aston Publications Explorer

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification

Author: Chen Yixiang
Li Lantian
Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 07/06/2017
Field of study

For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay detection method is therefore highly desirable. In this study, we investigate a major difficulty in replay detection: the over-fitting problem caused by variability factors in speech signal. An F-ratio probing tool is proposed and three variability factors are investigated using this tool: speaker identity, speech content and playback & recording device. The analysis shows that device is the most influential factor that contributes the highest over-fitting risk. A frequency warping approach is studied to alleviate the over-fitting problem, as verified on the ASV-spoof 2017 database

arXiv.org e-Print Archive

Crossref

Application of Speech Recognition for Swiftlet Vocalizations

Author: Kamarul Hawari Ghazali
Saiful Nizam Tajuddin
Siti Nurzalikha Zaini Husni Zaini
Sunardi S.
Publication venue
Publication date: 01/01/2013
Field of study

This research is about speech recognition technique are used for swiftlet vocalization application. Swiftlet vocalization need a system for recognize because there are many types of swiftlet sounds use in industry only can inspection by human expert. This research use speech recognition by using Mel Frequency Cepstral Coefficient (MFCC) for feature extraction and Distance Time Warping (DTW) for classification to calculate accuracy and efficiency combination both techniques

UMP Institutional Repository

Spectral analysis for nonstationary audio

Author: Meynard Adrien
Torrésani Bruno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/08/2018
Field of study

A new approach for the analysis of nonstationary signals is proposed, with a focus on audio applications. Following earlier contributions, nonstationarity is modeled via stationarity-breaking operators acting on Gaussian stationary random signals. The focus is on time warping and amplitude modulation, and an approximate maximum-likelihood approach based on suitable approximations in the wavelet transform domain is developed. This paper provides theoretical analysis of the approximations, and introduces JEFAS, a corresponding estimation algorithm. The latter is tested and validated on synthetic as well as real audio signal.Comment: IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, In pres

arXiv.org e-Print Archive

HAL AMU