4,292 research outputs found

    Monolingual and bilingual spanish-catalan speech recognizers developed from SpeechDat databases

    Get PDF
    Under the SpeechDat specifications, the Spanish member of SpeechDat consortium has recorded a Catalan database that includes one thousand speakers. This communication describes some experimental work that has been carried out using both the Spanish and the Catalan speech material. A speech recognition system has been trained for the Spanish language using a selection of the phonetically balanced utterances from the 4500 SpeechDat training sessions. Utterances with mispronounced or incomplete words and with intermittent noise were discarded. A set of 26 allophones was selected to account for the Spanish sounds and clustered demiphones have been used as context dependent sub-lexical units. Following the same methodology, a recognition system was trained from the Catalan SpeechDat database. Catalan sounds were described with 32 allophones. Additionally, a bilingual recognition system was built for both the Spanish and Catalan languages. By means of clustering techniques, the suitable set of allophones to cover simultaneously both languages was determined. Thus, 33 allophones were selected. The training material was built by the whole Catalan training material and the Spanish material coming from the Eastern region of Spain (the region where Catalan is spoken). The performance of the Spanish, Catalan and bilingual systems were assessed under the same framework. The Spanish system exhibits a significantly better performance than the rest of systems due to its better training. The bilingual system provides an equivalent performance to that afforded by both language specific systems trained with the Eastern Spanish material or the Catalan SpeechDat corpus.Peer ReviewedPostprint (published version

    German glide formation functionally viewed

    Get PDF
    Glide formation, a process whereby an underlying high front vowel is realized as a palatal glide, is shown to occur only in unstressed prevocalic position in German, and to be blocked by specific surface restrictions such as *ji and *“j. Traditional descriptions of glide formation (including derivational as well as Optimality theoretic approaches) refer to the syllable in order to capture its conditions. The present study illustrates that glide formation (plus the distribution of long and short tense /i/) in German can better be captured in a Functional Phonology account (Boersma 1998) which makes reference to stress instead of the syllable and thus overcomes problems of former approaches

    Subphonemic and suballophonic consonant variation : the role of the phoneme inventory

    Get PDF
    Consonants exhibit more variation in their phonetic realization than is typically acknowledged, but that variation is linguistically constrained. Acoustic analysis of both read and spontaneous speech reveals that consonants are not necessarily realized with the manner of articulation they would have in careful citation form. Although the variation is wider than one would imagine, it is limited by the phoneme inventory. The phoneme inventory of the language restricts the range of variation to protect the system of phonemic contrast. That is, consonants may stray phonetically into unfilled areas of the language's sound space. Listeners are seldom consciously aware of the consonant variation, and perceive the consonants phonemically as in their citation forms. A better understanding of surface phonetic consonant variation can help make predictions in theoretical domains and advances in applied domains

    The diachronic emergence of retroflex segments in three languages

    Get PDF
    The present study shows that though retroflex segments can be considered articulatorily marked, there are perceptual reasons why languages introduce this class into their phoneme inventory. This observation is illustrated with the diachronic developments of retroflexes in Norwegian (North- Germanic), Nyawaygi (Australian) and Minto-Nenana (Athapaskan). The developments in these three languages are modelled in a perceptually oriented phonological theory, since traditional articulatorily-based features cannot deal with such processes

    Word recognition from tiered phonological models

    Get PDF
    Phonologically constrained morphological analysis (PCMA) is the decomposition of words into their component morphemes conditioned by both orthography and pronunciation. This article describes PCMA and its application in large-vocabulary continuous speech recognition to enhance recognition performance in some tasks. Our experiments, based on the British National Corpus and the LOB Corpus for training data and WSJCAM0 for test data, show clearly that PCMA leads to smaller lexicon size, smaller language models, superior word lattices and a decrease in word error rates. PCMA seems to show most benefit in open-vocabulary tasks, where the productivity of a morph unit lexicon makes a substantial reduction in out-ofvocabulary rates

    Native Speaker Perceptions of Accented Speech: The English Pronunciation of Macedonian EFL Learners

    Get PDF
    The paper reports on the results of a study that aimed to describe the vocalic and consonantal features of the English pronunciation of Macedonian EFL learners as perceived by native speakers of English and to find out whether native speakers who speak different standard variants of English perceive the same segments as non-native. A specially designed computer web application was employed to gather two types of data: a) quantitative (frequency of segment variables and global foreign accent ratings on a 5-point scale), and b) qualitative (open-ended questions). The result analysis points out to three most frequent markers of foreign accent in the English speech of Macedonian EFL learners: final obstruent devoicing, vowel shortening and substitution of English dental fricatives with Macedonian dental plosives. It also reflects additional phonetic aspects poorly explained in the available reference literature such as allophonic distributional differences between the two languages and intonational mismatch

    Verification of feature regions for stops and fricatives in natural speech

    Get PDF
    The presence of acoustic cues and their importance in speech perception have long remained debatable topics. In spite of several studies that exist in this eld, very little is known about what exactly humans perceive in speech. This research takes a novel approach towards understanding speech perception. A new method, named three-dimensional deep search (3DDS), was developed to explore the perceptual cues of 16 consonant-vowel (CV) syllables, namely /pa/, /ta/, /ka/, /ba/, /da/, /ga/, /fa/, /Ta/, /sa/, /Sa/, /va/, /Da/, /za/, /Za/, from naturally produced speech. A veri cation experiment was then conducted to further verify the ndings of the 3DDS method. For this pur- pose, the time-frequency coordinate that de nes each CV was ltered out using the short-time Fourier transform (STFT), and perceptual tests were then conducted. A comparison between unmodi ed speech sounds and those without the acoustic cues was made. In most of the cases, the scores dropped from 100% to chance levels even at 12 dB SNR. This clearly emphasizes the importance of features in identifying each CV. The results con rm earlier ndings that stops are characterized by a short-duration burst preceding the vowel by 10 cs in the unvoiced case, and appearing almost coincident with the vowel in the voiced case. As has been previously hypothesized, we con rmed that the F2 transition plays no signi cant role in consonant identi cation. 3DDS analysis labels the /sa/ and /za/ perceptual features as an intense frication noise around 4 kHz, preceding the vowel by 15{20 cs, with the /za/ feature being around 5 cs shorter in duration than that of /sa/; the /Sa/ and /Za/ events are found to be frication energy near 2 kHz, preceding the vowel by 17{20 cs. /fa/ has a relatively weak burst and frication energy over a wide-band including 2{6 kHz, while /va/ has a cue in the 1.5 kHz mid-frequency region preceding the vowel by 7{10 cs. New information is established regarding /Da/ and /Ta/, especially with regards to the nature of their signi cant confusions

    On the causes of compensation for coarticulation : evidence for phonological mediation

    Get PDF
    This study examined whether compensation for coarticulation in fricative-vowel syllables is phonologically mediated or a consequence of auditory processes. Smits (2001a) had shown that compensation occurs for anticipatory lip rounding in a fricative caused by a following rounded vowel in Dutch. In a first experiment, the possibility that compensation is due to general auditory processing was investigated using nonspeech sounds. These did not cause context effects akin to compensation for coarticulation, although nonspeech sounds influenced speech sound identification in an integrative fashion. In a second experiment, a possible phonological basis for compensation for coarticulation was assessed by using audiovisual speech. Visual displays, which induced the perception of a rounded vowel, also influenced compensation for anticipatory lip rounding in the fricative. These results indicate that compensation for anticipatory lip rounding in fricative-vowel syllables is phonologically mediated. This result is discussed in the light of other compensation-for-coarticulation findings and general theories of speech perception.peer-reviewe

    Research methods and intelligibility studies

    Full text link
    This paper first briefly reviews the concept of intelligibility as it has been employed in both English as a Lingua Franca (ELF) and world Englishes (WE) research. It then examines the findings of the Lingua Franca Core (LFC), a list of phonological features that empirical research has shown to be important for safeguarding mutual intelligibility between non-native speakers of English. The main point of the paper is to analyse these findings and demonstrate that many of them can be explained if three perspectives (linguistic, psycholinguistic and historical-variationist) are taken. This demonstration aims to increase the explanatory power of the concept of intelligibility by providing some theoretical background. An implication for ELF research is that at the phonological level, internationally intelligible speakers have a large number of features in common, regardless of whether they are non-native speakers or native speakers. An implication for WE research is that taking a variety-based, rather than a features-based, view of phonological variation and its connection with intelligibility is likely to be unhelpful, as intelligibility depends to some extent on the phonological features of individual speakers, rather than on the varieties per se
    corecore