65,245 research outputs found

    Wavelet transforms for non-uniform speech recognition

    Get PDF
    An algorithm for nonuniform speech segmentation and its application in speech recognition systems is presented. A method based on the Modulated Gaussian Wavelet Transform based Speech Analyser (MGWTSA) and the subsequent parametrization block is used to transform a uniform signal into a set of nonuniformly separated frames, with the accurate information being fed into a speech recognition system. The algorithm needs a frame characterizing the signal where necessary, trying to reduce the number of frames per signal as much as possible, without an appreciable reduction in the recognition rate of the system.Peer ReviewedPostprint (published version

    Continuous speech segmentation using local adaptive thresholding technique in the blocking block area method

    Get PDF
    Continuous speech is a form of natural human speech that is continuous without a clear boundary between words. In continuous speech recognition, a segmentation process is needed to cut the sentence at the boundary of each word. Segmentation becomes an important step because a speech can be recognized from the word segments produced by this process. The segmentation process in this study was carried out using local adaptive thresholding technique in the blocking block area method. This study aims to conduct performance comparisons for five local adaptive thresholding methods (Niblack, Sauvola, Bradley, Guanglei Xiong and Bernsen) in continuous speech segmentation to obtain the best method and optimum parameter values. Based on the results of the study, Niblack method is concluded as the best method for continuous speech segmentation in Indonesian language with the accuracy value of 95%, and the optimum parameter values for such method are window = 75 and k = 0.2

    Visual speech recognition and utterance segmentation based on mouth movement

    Get PDF
    This paper presents a vision-based approach to recognize speech without evaluating the acoustic signals. The proposed technique combines motion features and support vector machines (SVMs) to classify utterances. Segmentation of utterances is important in a visual speech recognition system. This research proposes a video segmentation method to detect the start and end frames of isolated utterances from an image sequence. Frames that correspond to `speaking' and `silence' phases are identified based on mouth movement information. The experimental results demonstrate that the proposed visual speech recognition technique yields high accuracy in a phoneme classification task. Potential applications of such a system are, e.g., human computer interface (HCI) for mobility-impaired users, lip-reading mobile phones, in-vehicle systems, and improvement of speech-based computer control in noisy environments

    An experimental HMM-based postal OCR system

    Get PDF
    It is almost universally accepted in speech recognition that phone- or word-level segmentation prior to recognition is neither feasible nor desirable, and in the dynamic (pen-based) handwriting recognition domain the success of segmentation-free techniques points to the same conclusion. But in image-based handwriting recognition, this conclusion is far from being firmly established, and the results presented in this paper show that systems employing character-level presegmentation can be more effective, even within the same HMM paradigm, than systems relying on sliding window feature extraction. We describe two variants of a Hidden Markov system recognizing handwritten addresses on US mail, one with presegmentation and one without, and report results on the CEDAR data set. 1. INTRODUCTION Any approach to speech and handwriting recognition must take into account that the signal is composed from a succession of alphabetic units (phonemes or graphemes). In the early work on speech recog..

    Teknik-teknik mengenalpasti sela masa senyap dalam sistem pengecaman suara

    Get PDF
    Classification of speech into voiced, unvoiced and silence (V/UV/S) regions is an important process in many speech processing applications such as speech synthesis, segmentation and speech recognition system. Two such measures are investigated with respect to their ability to discern voiced/unvoiced and silence segments of speech. They are the Instantaneous Energy (IE) and Local Time Correlation (LTC) method. Both IE and LTC methods are recently proposed technique for nonstationary signal analysis and have been successfully applied to speech processing. A comparative study was made using these two algorithms for classifying a given speech segment into two classes: voiced/unvoiced speech and silence. IE and LTC methods were proposed to remove all the silent intervals in speech sample. Experiment are carried out using Linear Predictive Coding (LPC) and Dynamic Time Warping (DTW) for isolated digit recognition in Bahasa Malaysia. The technique without silent removal LPC-DTW gives a recognition accuracy of 98.28%. With detection and removing of silent interval, both technique IE-LPCDTW and LTC-LPC-DTW gives a recognition accuracy of 98%. The system then are applied for training and testing for connected digit recognition. The segmentation of input string of the digits are carried out using IE and LTC techniques. Connected digit recognition using IE-LPC-DTW had 93.3% digit accuracy and 78% digit string. However using LTC-LPC-DTW the performance decreased to 93.2% and 77.7% respectively

    Lexical segmentation and word recognition in fluent aphasia

    Get PDF
    The current thesis reports a psycholinguistic study of lexical segmentation and word recognition in fluent aphasia.When listening to normal running speech we must identify individual words from a continuous stream before we can extract a linguistic message from it. Normal listeners are able to resolve the segmentation problem without any noticeable difficulty. In this thesis I consider how fluent aphasic listeners perform the process of lexical segmentation and whether any of their impaired comprehension of spoken language has its provenance in the failure to segment speech normally.The investigation was composed of a series of 5 experiments which examined the processing of both explicit acoustic and prosodic cues to word juncture and features which affect listeners' segmentation of the speech stream implicitly, through inter-lexical competition of potential word matchesThe data collected show that lexical segmentation of continuous speech is compromised in fluent aphasia. Word hypotheses do not always accrue appropriate activational information from all of the available sources within the time frame in which segmentation problem is normally resolved. The fluent aphasic performance, although quantitatively impaired compared to normal, reflects an underlying normal competence; their processing seldom displays a totally qualitatively different processing profile to normal. They are able to engage frequency, morphological structure, and imageability as modulators of activation. Word class, a feature found to be influential in the normal resolution of segmentation is not used by the fluent aphasic studied. In those cases of occasional failure to adequately resolve segmentation by automatic frequency mediated activation, fluent aphasics invoke the metalinguistic influence of real world plausibility of alternative parses

    Connectionist modelling of lexical segmentation and vocabulary acquisition

    Get PDF
    Adults typically hear sentences in their native language as a sequence of separate words and we might therefeore assume, that words in speech are physically separated in the way that they are perceived. However, when listening to an unfamiliar language we no longer experience sequences of discrete words, but rather hear a continuous stream of speech with boundaries separating individual sentences or utterances. Theories of how adult listeners segment the speech stream into words emphasise the role that knowledge of individual words plays in the segmentation of speech. However, since words can not be learnt until the speech stream can be segmented, it seems unlikely that infants will be able to use word recognition to segment connected speech. For this reason, researchers have proposed a variety of strategies and cues that infants could use to identify word boundaries without being able to recognise the words that these boundaries delimit. This chapter, describes some computational simulations proposing ways in which these cues and strategies for the acquisition of lexical segmentation can be integrated with the infants’ acquisition of the meanings of words. The simulations reported here describe simple computational mechanisms and knowledge sources that may support these different aspects of language acquisition
    • 

    corecore