    Speech recognition mempunyai cakupan implementasi yang sangat luas dalam berbagai bidang kehidupan saat ini, seperti dalam sistem keamanan, alat medis, hingga bidang pendidikan, tak terkecuali dalam bidang pembelajaran keagamaan. Teknologi speech recognition dapat dijadikan alternatif media penunjang dalam proses menghafal Al-Quran. Penelitian ini akan mengimplementasikan algoritma Dynamic Time Warping ke dalam sebuah sistem isolated speech recognition untuk mengenali kata awal pada ayat Al-Quran. Proses recognition mencakup feature extraction dan feature matching. Feature extraction dilakukan untuk mengambil fitur dari sinyal data suara menggunakan Mel Frequency Cepstral Coefficients, sedangkan feature matching dilakukan untuk mendapatkan hasil recognition berupa nilai cost fitur yang paling minimum menggunakan Dynamic Time Warping. Hasil eksperimen menggunakan 5 orang model dan 3 jenis template speech menunjukkan bahwa dynamic time warping dapat mengenali suara yang diucapkan oleh orang yang sama, sangat sensitif terhadap suara yang diucapkan oleh orang yang berbeda, dan orang yang sama memiliki kecenderungan untuk melafalkan ayat dengan variasi yang konsisten.---- Speech recognition has a very wide and large scope of implementation in today’s living, such as security system, medical tools, up to education sector, including religious education. Speech recognition technology can be a media alternative for supporting Al-Quran memorizing process. This research would like to implements Dynamic Time Warping algorithm into an isolated speech recognition system for recognizing first word of each ayah of Al-Quran. The recognition process including two main phase, feature extraction and feature matching. Feature extraction is taken for getting features from speech data using Mel Frequency Cepstral Coefficients, whereas feature matching aims for getting the most minimum feature cost recognition result using Dynamic Time Warping. Experiments were done using 5 speech models and 3 kind of templates, and give results that dynamic time warping could recognize voice spoken by same person, is sensitive to voice spoken by different person, and a same person has tendency to pronounce ayah of Al-Quran with a stable consistency

    Assessment of time frequency warping for use as a reference degradation for assessing synthetic speech

    At present there is no standard assessment method for rating and comparing the quality of synthesized speech. This study assesses the suitability of Time Frequency Warping (TFW) modulation for use as a reference device for assessing synthesized speech. Time Frequency Warping modulation introduces timing errors into natural speech that produce perceptual errors similar to those found in synthetic speech. It is proposed that TFW modulation used in conjunction with a listening effort test would provide a standard assessment method for rating the quality of synthesized speech. This study identifies the most suitable TFW modulation variable parameter to be used for assessing synthetic speech and assess the results of several assessment tests that rate examples of synthesized speech in terms of the TFW variable parameter and listening effort. The study also attempts to identify the attributes of speech that differentiate synthetic, TFW modulated and natural speech

    A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification

    For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay detection method is therefore highly desirable. In this study, we investigate a major difficulty in replay detection: the over-fitting problem caused by variability factors in speech signal. An F-ratio probing tool is proposed and three variability factors are investigated using this tool: speaker identity, speech content and playback & recording device. The analysis shows that device is the most influential factor that contributes the highest over-fitting risk. A frequency warping approach is studied to alleviate the over-fitting problem, as verified on the ASV-spoof 2017 database

    Application of Speech Recognition for Swiftlet Vocalizations

    This research is about speech recognition technique are used for swiftlet vocalization application. Swiftlet vocalization need a system for recognize because there are many types of swiftlet sounds use in industry only can inspection by human expert. This research use speech recognition by using Mel Frequency Cepstral Coefficient (MFCC) for feature extraction and Distance Time Warping (DTW) for classification to calculate accuracy and efficiency combination both techniques

    Spectral analysis for nonstationary audio

    A new approach for the analysis of nonstationary signals is proposed, with a focus on audio applications. Following earlier contributions, nonstationarity is modeled via stationarity-breaking operators acting on Gaussian stationary random signals. The focus is on time warping and amplitude modulation, and an approximate maximum-likelihood approach based on suitable approximations in the wavelet transform domain is developed. This paper provides theoretical analysis of the approximations, and introduces JEFAS, a corresponding estimation algorithm. The latter is tested and validated on synthetic as well as real audio signal.Comment: IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, In pres
