928 research outputs found

    Learning [Voice]

    Get PDF
    The [voice] distinction between homorganic stops and fricatives is made by a number of acoustic correlates including voicing, segment duration, and preceding vowel duration. The present work looks at [voice] from a number of multidimensional perspectives. This dissertation\u27s focus is a corpus study of the phonetic realization of [voice] in two English-learning infants aged 1;1--3;5. While preceding vowel duration has been studied before in infants, the other correlates of post-vocalic voicing investigated here --- preceding F1, consonant duration, and closure voicing intensity --- had not been measured before in infant speech. The study makes empirical contributions regarding the development of the production of [voice] in infants, not just from a surface-level perspective but also with implications for the phonetics-phonology interface in the adult and developing linguistic systems. Additionally, several methodological contributions will be made in the use of large sized corpora and data modeling techniques. The study revealed that even in infants, F1 at the midpoint of a vowel preceding a voiced consonant was lower by roughly 50 Hz compared to a vowel before a voiceless consonant, which is in line with the effect found in adults. But while the effect has been considered most likely to be a physiological and nonlinguistic phenomenon in adults, it actually appeared to be correlated in the wrong direction with other aspects of [voice] here, casting doubt on a physiological explanation. Some of the consonant pairs had statistically significant differences in duration and closure voicing. Additionally, a preceding vowel duration difference was found and as well a preliminary indication of a developmental trend that suggests the preceding vowel duration difference is being learned. The phonetics of adult speech is also considered. Results are presented from a dialectal corpus study of North American English and a lab speech experiment which clarifies the relationship between preceding vowel duration and flapping and the relationship between [voice] and F1 in preceding vowels. Fluent adult speech is also described and machine learning algorithms are applied to learning the [voice] distinction using multidimensional acoustic input plus some lexical knowledge

    Production and perception of Libyan Arabic vowels

    Get PDF
    PhD ThesisThis study investigates the production and perception of Libyan Arabic (LA) vowels by native speakers and the relation between these major aspects of speech. The aim was to provide a detailed acoustic and auditory description of the vowels available in the LA inventory and to compare the phonetic features of these vowels with those of other Arabic varieties. A review of the relevant literature showed that the LA dialect has not been investigated experimentally. The small number of studies conducted in the last few decades have been based mainly on impressionistic accounts. This study consists of two main investigations: one concerned with vowel production and the other with vowel perception. In terms of production, the study focused on gathering the data necessary to define the vowel inventory of the dialect and to explore the qualitative and quantitative characteristics of the vowels contained in this inventory. Twenty native speakers of LA were recorded while reading target monosyllabic words in carrier sentences. Acoustic and auditory analyses were used in order to provide a fairly comprehensive and objective description of the vocalic system of LA. The results showed that phonologically short and long Arabic vowels vary significantly in quality as well as quantity; a finding which is increasingly being reported in experimental studies of other Arabic dialects. Short vowels in LA tend to be more centralised than has been reported for other Arabic vowels, especially with regards to short /a/. The study also looked at the effect of voicing in neighbouring consonants and vowel height on vowel duration, and the findings were compared to those of other varieties/languages. The perception part of the study explored the extent to which listeners use the same acoustic cues of length and quality in vowel perception that are evident in their production. This involved the use of continua from synthesised vowels which varied along duration and/or formant frequency dimensions. The continua were randomised and played to 20 native listeners who took part in an identification task. The results show that, when it comes to perception, Arabic listeners still rely mainly on quantity for the distinction between phonologically long and short vowels. That is, when presented with stimuli containing conflicting acoustic cues (formant frequencies that are typical of long vowels but with short duration or formant frequencies that are typical of short vowels but with long duration), listeners reacted consistently to duration rather than formant frequency. The results of both parts of the study provided some understanding of the LA vowel system. The production data allowed for a detailed description of the phonetic characteristics of LA vowels, and the acoustic space that they occupy was compared with those of other Arabic varieties. The perception data showed that production and perception do not always go hand in hand and that primary acoustic cues for the identification of vowels are dialect- and language-specific

    An acoustic-phonetic approach in automatic Arabic speech recognition

    Get PDF
    In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

    Phonological Awareness

    Get PDF
    This study aims at getting a better understanding of human speech processing, and explores specifically the task that infants face while learning their native language. Indeed, this work sheds light on 30 years of research that have questioned the developments in early infancy that allow word learning to proceed rapidly before two years of age. Infants are born with Perceptual biases that facilitate attention to speech and the encoding of its properties over the first several months of life, infants' perceptual biases increasingly conform to native language patterns. By the end of this study, it is suggested that word learning is another bootstrapping phenomenon in developmental research. It does not mean it can be reduced to perceptual and learning. Instead, we argue that perceptual learning provides a foundation upon which abstract linguistic units can be built. Just as phonological patterns act as cues to morphological and syntactic structure, and just as naive concepts allow infants to learn more complex ones, perceptual learning allows segmentation and representation of word forms that, once mapped to concepts, bootstrap the process of word learning and lead to a qualitative improvement in its efficiency

    The production and perception of peripheral geminate/singleton coronal stop contrasts in Arabic

    Get PDF
    Gemination is typologically common word-medially but is rare at the periphery of the word (word-initially and -finally). In line with this observation, prior research on production and perception of gemination has focused primarily on medial gemination. Much less is known about the production and perception of peripheral gemination. This PhD thesis reports on comprehensive articulatory, acoustic and perceptual investigations of geminate-singleton contrasts according to the position of the contrast in the word and in the utterance. The production component of the project investigated the articulatory and acoustic features of medial and peripheral gemination of voiced and voiceless coronal stops in Modern standard Arabic and regional Arabic vernacular dialects, as produced by speakers from two disparate and geographically distant countries, Morocco and Lebanon. The perceptual experiment investigated how standard and dialectal Arabic gemination contrasts in each word position were categorised and discriminated by three groups of non-native listeners, each differing in their native language experience with gemination at different word positions. The first experiment used ultrasound and acoustic recordings to address the extent to which word-initial gemination in Moroccan and Lebanese dialectal Arabic is maintained, as well as the articulatory and acoustic variability of the contrast according to the position of the gemination contrast in the utterance (initial vs. medial) and between the two dialects. The second experiment compared the production of word-medial and -final gemination in Modern Standard Arabic as produced by Moroccan and Lebanese speakers. The aim of the perceptual experiment was to disentangle the contribution of phonological and phonetic effects of the listeners’ native languages on the categorisation and discrimination of non-lexical Moroccan gemination by three groups of non-native listeners varying in their phonological (native Lebanese group and heritage Lebanese group, for whom Moroccan is unintelligible, i.e., non-native language) and phonetic-only (native English group) experience with gemination across the three word positions. The findings in this thesis constitute important contributions about positional and dialectal effects on the production and perception of gemination contrasts, going beyond medial gemination (which was mainly included as control) and illuminating in particular the typologically rare peripheral gemination

    Computational modelling of segmental and prosodic levels of analysis for capturing variation across Arabic dialects

    Get PDF
    Dialect variation spans different linguistic levels of analysis. Two examples include the typical phonetic realisations produced and the typical range of intonational choices made by individuals belonging to a given dialect group. Taking the modelling principles of a specific automatic accent recognition system, the work here characterises and observes the variation that exists within these two specific levels of analysis among eight Arabic dialects. Using a method that has previously shown promising performance on English accent varieties, we first model the segmental level of analysis from recordings of Arabic speakers to capture the variation in the phonetic realisations of the vowels and consonants. In doing so, we show how powerful this model can be in distinguishing between Arabic dialects. This paper then shows how this modelling approach can be adapted to instead characterise prosodic variation among these same dialects from the same speech recordings. This allows us to inspect the relative power of the segmental and prosodic levels of analysis in separating the Arabic dialects. This work opens up the possibility of using these modelling frameworks to study the extent and nature of phonetic and prosodic variation across speech corpora

    Voicing Contrast in Najdi Arabic Stops: Implications for Laryngeal Realism

    Get PDF
    Ph.D. (Integrated) ThesisThe present study investigates the phonetic and phonological aspects of the voicing contrast in stops in Najdi Arabic, a dialect that has been found to contrast prevoiced and aspirated stops. This study discusses the implications of the acoustic correlates of Voiceless and Voiced stops for the phonological representation of the voicing contrast in this variety and examines the connection between the acoustic signal and the distinctive features that specify the opposition by employing the types of evidence proposed in the realm of laryngeal realism. These types of evidence include the manifestation of acoustic correlates of stops in various positions, speech rate effect on aspiration and prevoicing, and the Voiceless and Voiced stops’ behaviour in stop-stop clusters across word boundary in terms of regressive voicing assimilation. The manifestation of the acoustic correlates of Voiceless and Voiced stops shows that Voiceless stops are aspirated in the examined positions whereas Voiced stops show robust prevoicing in utterance-initial and utterance-medial contexts. The acoustic correlates also show that Voiceless stops are robustly accompanied by longer closure, longer burst, higher F0 and F1 onset, and lower burst intensity. Voiced stops, on the other hand, are robustly accompanied by shorter closure (utterance-medially), shorter burst, lower F0 and F1, and higher burst intensity. Speech rate affects both aspiration and prevoicing in Voiceless and Voiced stops, respectively. Prevoicing and aspiration are lengthened in normal speech rate in comparison to fast speech rate. Stop-stop cluster results show that both Voiceless and Voiced stops trigger some (de)voicing in the preceding member of the cluster. The acoustic analysis reveals that Voiceless stops show voicing assimilation in F0/F1 and burst intensity but not in voicing in the closure. For Voiced stops, the results show a degree of devoicing in their closure but not in F0/F1 and burst intensity. The results suggest that Voiceless and Voiced stops in Najdi Arabic have features from both aspirating and voicing languages. This claim is supported by the three types of evidence implemented in this study. The assumption that both Voiceless and Voiced stops are specified implicates that the voicing contrast in Najdi Arabic is overspecified in the phonology with two features, [spread glottis] and [voice]. Applying the numeric values of phonetic distinctive features proposed by Beckman et al. (2013), on the scale of 1 to 9, the present study claims that Voiced stops in Najdi Arabic are specified with [9 voice] while Voiceless stops are specified with [8 spread glottis], mainly because of the existence of moderate aspiration in utterance-initial Voiceless stops and the robust prevoicing found in utterance-initial and utterance-medial Voiced stops (1 means inactive, 9 means highly active). The phonological repercussions for the proposed overspecification in the voicing contrast in Najdi Arabic are discussed with a specific focus on the inclusion of such a patterning in theoretical models of voicing

    The early phase of /ɹ/ production development in adult Japanese learners of English

    Get PDF
    Although previous research indicates that Japanese speakers’ second-language (L2) perception and production of English /ɹ/ may improve with increased L2 experience, relatively little is known about the fine phonetic details of their /ɹ/ productions, especially during the early phase of L2 speech learning. This cross-sectional study examined acoustic properties of word-initial /ɹ/ from 60 Japanese learners with a length of residence (LOR) between one month and one year in Canada. Their performance was compared to that of 15 native speakers of English and 15 low-proficiency Japanese learners of English. Formant frequencies (F2 and F3) and F1 transition durations were evaluated under three task conditions—word reading, sentence reading, and timed picture description. Learners with as little as two to three months of residence demonstrated target-like F2 frequencies. In addition, increased LOR was predictive of more target-like transition durations. Although the learners showed some improvement in F3 as a function of LOR, they did so mainly at a controlled level of speech production. The findings suggest that during the early phase of L2 segmental development, production accuracy is task-dependent and is influenced by the availability of L1 phonetic cues for redeployment in L2

    Perception of English and Polish obstruents

    Get PDF
    Praca niniejsza koncentruje się na kontraście dźwięczna-bezdźwięczna w percepcji angielskich i polskich spółgłosek właściwych. Metodologia badań oparta została na manipulacji akustycznej parametrów temporalnych i spektralnych, które biorą udział w implementacji kontrastu dźwięczności w badanych językach. Porównane zastałych trzy grupy badanych – początkujący uczący się języka angielskiego, zaawansowani użytkownicy języka angielskiego, oraz rodowici mówcy języka angielskiego. Praca składa się z dwóch części teoretycznych, ilustrujących problematykę i kontrastujących strategie implementacji kontrastu dźwięczności w badanych językach, oraz części badawczej, prezentującej zastosowaną metodologię badań oraz analizę wyników. Część pierwsza porusza problem roli percepcji mowy w badaniach językoznawczych. Dotyka takich aspektów jak brak bezpośredniej relacji między sygnałem dźwiękowym a kategorią fonologiczną, wyjątkowa plastyczność i zdolność adaptacyjna ludzkiej percepcji mowy, oraz referuje propozycje dotyczące kompleksowego opisu działania ludzkiej percepcji mowy. W kolejnych podrozdziałach praca omawia percepcję w kontekście kontaktu językowego, a więc rozróżnianie kontrastów akustycznych występujących w języku obcym, ale nieobecnych w języku pierwszym. Zostają również zrecenzowane modele, które taki proces opisują, jak i hipotezy opisujące potencjalny sukces w opanowaniu efektywnej percepcji kontrastów percepcyjnych występujących w języku obcym. Część druga koncentruje się na różnicach temporalnych i akustycznych w implementacji dźwięczności w języku angielskim i polskim. Opisane zostają aspekty takie jak; Voice Onset Time, długość samogłoski, długość zwarcia, długość frykcji, ubezdźwięcznienie, długość wybuchu. Cześć trzecia, badawcza, prezentuje materiał poddany badaniu, metodologię manipulacji materiału, oraz charakterystykę grup. Hipotezy oparte na założeniach teoretycznych są następnie weryfikowane przy pomocy otrzymanych wyników. Część końcowa omawia problemy percepcyjne, jakie spotykają Polaków uczących się języka angielskiego oraz wyciąga wnioski pedagogiczne

    Arabic Continuous Speech Recognition System using Sphinx-4

    Get PDF
    Speech is the most natural form of human communication and speech processing has been one of the most exciting areas of the signal processing. Speech recognition technology has made it possible for computer to follow human voice commands and understand human languages. The main goal of speech recognition area is to develop techniques and systems for speech input to machine and treat this speech to be used in many applications. As Arabic is one of the most widely spoken languages in the world. Statistics show that it is the first language (mother-tongue) of 206 million native speakers ranked as fourth after Mandarin, Spanish and English. In spite of its importance, research effort on Arabic Automatic Speech Recognition (ASR) is unfortunately still inadequate[7]. This thesis proposes and describes an efficient and effective framework for designing and developing a speaker-independent continuous automatic Arabic speech recognition system based on a phonetically rich and balanced speech corpus. The developing Arabic speech recognition system is based on the Carnegie Mellon university Sphinx tools. To build the system, we develop three basic components. The dictionary which contains all possible phonetic pronunciations of any word in the domain vocabulary. The second one is the language model such a model tries to capture the properties of a sequence of words by means of a probability distribution, and to predict the next word in a speech sequence. The last one is the acoustic model which will be created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. The system use the rich and balanced database that contains 367 sentences, a total of 14232 words. The phonetic dictionary contains about 23,841 definitions corresponding to the database words. And the language model contains14233 mono-gram and 32813 bi-grams and 37771 tri-grams. The engine uses 3-emmiting states Hidden Markov Models (HMMs) for tri-phone-based acoustic models
    • …
    corecore