5 research outputs found

    A Novel Method For Speech Segmentation Based On Speakers' Characteristics

    Full text link
    Speech Segmentation is the process change point detection for partitioning an input audio stream into regions each of which corresponds to only one audio source or one speaker. One application of this system is in Speaker Diarization systems. There are several methods for speaker segmentation; however, most of the Speaker Diarization Systems use BIC-based Segmentation methods. The main goal of this paper is to propose a new method for speaker segmentation with higher speed than the current methods - e.g. BIC - and acceptable accuracy. Our proposed method is based on the pitch frequency of the speech. The accuracy of this method is similar to the accuracy of common speaker segmentation methods. However, its computation cost is much less than theirs. We show that our method is about 2.4 times faster than the BIC-based method, while the average accuracy of pitch-based method is slightly higher than that of the BIC-based method.Comment: 14 pages, 8 figure

    Phonetic Segmentation using a Wavelet-based Speech Cepstral Features and Sparse Representation Classifier, Journal of Telecommunications and Information Technology, 2021, nr 4

    Get PDF
    Speech segmentation is the process of dividing speech signal into distinct acoustic blocks that could be words, syllables or phonemes. Phonetic segmentation is about finding the exact boundaries for the different phonemes that composes a specific speech signal. This problem is crucial for many applications, i.e. automatic speech recognition (ASR). In this paper we propose a new model-based text independent phonetic segmentation method based on wavelet packet speech parametrization features and using the sparse representation classifier (SRC). Experiments were performed on two datasets, the first is an English one derived from TIMIT corpus, while the second is an Arabic one derived from the Arabic speech corpus. Results showed that the proposed wavelet packet de composition features outperform the MFCC features in speech segmentation task, in terms of both F1-score and R-measure on both datasets. Results also indicate that the SRC gives higher hit rate than the famous k-Nearest Neighbors (k-NN) classifier on TIMIT datase

    Malay articulation system for early screening diagnostic using hidden markov model and genetic algorithm

    Get PDF
    Speech recognition is an important technology and can be used as a great aid for individuals with sight or hearing disabilities today. There are extensive research interest and development in this area for over the past decades. However, the prospect in Malaysia regarding the usage and exposure is still immature even though there is demand from the medical and healthcare sector. The aim of this research is to assess the quality and the impact of using computerized method for early screening of speech articulation disorder among Malaysian such as the omission, substitution, addition and distortion in their speech. In this study, the statistical probabilistic approach using Hidden Markov Model (HMM) has been adopted with newly designed Malay corpus for articulation disorder case following the SAMPA and IPA guidelines. Improvement is made at the front-end processing for feature vector selection by applying the silence region calibration algorithm for start and end point detection. The classifier had also been modified significantly by incorporating Viterbi search with Genetic Algorithm (GA) to obtain high accuracy in recognition result and for lexical unit classification. The results were evaluated by following National Institute of Standards and Technology (NIST) benchmarking. Based on the test, it shows that the recognition accuracy has been improved by 30% to 40% using Genetic Algorithm technique compared with conventional technique. A new corpus had been built with verification and justification from the medical expert in this study. In conclusion, computerized method for early screening can ease human effort in tackling speech disorders and the proposed Genetic Algorithm technique has been proven to improve the recognition performance in terms of search and classification task

    Automatic Segmentation of Speech Recorded in Unknown Noisy Channel Characteristics

    No full text
    This paper investigates the problem of automatic segmentation of speech recorded in noisy channel corrupted environments. Using an HMM-based speech segmentation algorithm, speech enhancement and parameter compensation techniques previously proposed for robust speech recognition are evaluated and compared for improved segmentation in colored noise. Speech enhancement algorithms considered include: Generalized Spectral Subtraction, Nonlinear Spectral Subtraction, Ephraim-Malah MMSE enhancement, and Auto-LSP Constrained Iterative Wiener filtering. In addition, the Parallel Model Combination (PMC) technique is also compared for additive noise compensation. In telephone environments, we compare channel normalization techniques including Cepstral Mean Normalization (CMN) and Signal Bias Removal (SBR) and consider the coupling of channel compensation with front-end speech enhancement for improved automatic segmentation. Compensation performance is assessed for each method by automatically seg..
    corecore