45 research outputs found

    A Dutch treatment of an elitist approach to articulatory-acoustic feature classification.

    Get PDF
    A novel approach to articulatory-acoustic feature extraction has been developed for enhancing the accuracy of classification associated with place and manner of articulation information. This elitist approach is tested on a corpus of spontaneous Dutch using two different systems, one trained on a subset of the same corpus, the other trained on a corpus from a different language (American English). The feature dimensions, voicing and manner of articulation transfer relatively well between the two languages. However, place information transfers less well. Manner-specific training can be used to improve classification of articulatory place information

    Articulatory feature recognition using dynamic Bayesian networks.

    Get PDF
    We describe a dynamic Bayesian network for articulatory feature recognition. The model is intended to be a component of a speech recognizer that avoids the problems of conventional ``beads-on-a-string'' phoneme-based models. We demonstrate that the model gives superior recognition of articulatory features from the speech signal compared with a state of- the art neural network system. We also introduce a training algorithm that offers two major advances: it does not require time-aligned feature labels and it allows the model to learn a set of asynchronous feature changes in a data-driven manner

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    Multilingual Articulatory Features

    Get PDF

    ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION

    Get PDF
    Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as `beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as `coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system

    Feature extraction and event detection for automatic speech recognition

    Get PDF

    A phonological study on English loanwords in Mandarin Chinese

    Get PDF
    The general opinion about the way English borrowings enter Mandarin is that English words are preferably integrated into Mandarin via calquing, which includes a special case called Phonetic-Semantic Matching (PSM) (Zuckermann 2004), meaning words being phonetically assimilated and semantically transferred at the same time. The reason for that is that Mandarin is written in Chinese characters, which each has a single-syllable pronunciation and a self-contained meaning, and the meaning achieved by the selection of characters may match the original English words. There are some cases which are agreed by many scholars to be PSM. However, as this study demonstrates, the semantics of the borrowing and the original word do not really match, the relation considered to be “artificial” by Novotná (1967). This study analyses a corpus of 600 established English loanwords in Mandarin to test the hypothesis that semantic matching is not a significant factor in the loanword adaptation process because there is no semantic relation between the borrowed words and the characters used to record them. To measure the phonological similarity between the English input and the Mandarin output, one of the models in adult second language perception, the Perceptual Assimilation Model (Best 1995a), is used as the framework to judge the phonemic matching between the English word and the adapted Mandarin outcome. The meanings of the characters used in recording the loanwords are referred in The Dictionary of Modern Chinese to see whether there are cases of semantic matching. The phonotactic adaptation of illicit sound sequences is also analysed in Optimality Theory (McCarthy 2002) to give an account of phonetic-phonological analysis of the adaptation process. Thus, the percentage of Phono-Semantic Matching is obtained in the corpus. As the corpus investigation shows, the loanwords that can match up both the phonological and the semantic quality of the original words are very few. The most commonly acknowledged phono-semantic matching cases are only phonetic loanwords. In conclusion, this paper argues that the semantic resource of Chinese writing system is not used as a major factor in the integration of loanwords. Borrowing between languages with different writing systems is not much different than borrowing between languages with same writing system or without a writing system. Though Chinese writing system interferes with the borrowing, it is the linguistic factors that determine the borrowing process and results. Chinese characters are, by a large proportion, conventional graphic signs with a phonetic value being the more significant factor in loanword integration process

    Urdu Vowel System and Perception of English Vowels by Punjabi-Urdu Speakers

    Get PDF
    A well-defined vocalic and consonantal system is a prerequisite when investigating the perception and production of a second language. The lack of a well-defined Urdu vowel system in the multilingual context of Pakistan motivated investigation of the acoustic and phonetic properties of Urdu vowels. Due to the significant influence of a number of first languages, the study focuses on the Urdu spoken in Punjab, Pakistan. A production experiment reports the acoustic properties of the monophthongs and six diphthongs in Urdu. The results showed that Urdu distinguishes between short and long vowels, and lacks an open-mid front and an open-mid back vowel. Since the central vowel is fairly open and retracted, it appears that the central vowel space is empty. This was reflected in the difficulty of perceiving the central vowels of Standard Southern British English (SSBE) by Punjabi Urdu speakers. The acoustic and phonetic evidence partially supports the phonetic existence of diphthongs in Urdu. The acoustic investigation of the Urdu vowel system helped to predict the perceptual assimilation and classification patterns of SSBE vowels by Punjabi-Urdu speakers. A cross-language perceptual assimilation and a free classification experiment was conducted in three different consonantal contexts to test the predictions of three mainstream models of L2 perception: SLM, PAM and L2LP. The assimilation patterns in a cross-language and category goodness rating task varied according to familiarity with the target language. The patterns of perceptual assimilation failed to predict the perceptual similarity of the SSBE vowels in the auditory free classification task. Thus, the findings support the model predictions with regard to the role of L1; however acoustic similarities between L1 and L2 neither predict the patterns of cross-language perceptual assimilation nor perceptual similarity
    corecore