80 research outputs found

    Pitch accent in spoken-word recognition in Japanese

    Get PDF
    Three experiments addressed the question of whether pitch-accent information may be exploited in the process of recognizing spoken words in Tokyo Japanese. In a two-choice classification task, listeners judged from which of two words, differing in accentual structure, isolated syllables had been extracted ~e.g., ka from baka HL or gaka LH!; most judgments were correct, and listeners’ decisions were correlated with the fundamental frequency characteristics of the syllables. In a gating experiment, listeners heard initial fragments of words and guessed what the words were; their guesses overwhelmingly had the same initial accent structure as the gated word even when only the beginning CV of the stimulus ~e.g., na- from nagasa HLL or nagashi LHH! was presented. In addition, listeners were more confident in guesses with the same initial accent structure as the stimulus than in guesses with different accent. In a lexical decision experiment, responses to spoken words ~e.g., ame HL! were speeded by previous presentation of the same word ~e.g., ame HL! but not by previous presentation of a word differing only in accent ~e.g., ame LH!. Together these findings provide strong evidence that accentual information constrains the activation and selection of candidates for spoken-word recognition

    The dynamics of Japanese prosody

    Get PDF
    This dissertation explores aspects of Tokyo Japanese (Japanese henceforth) prosody through acoustic analysis and analysis-by-synthesis. It 1) revisits existing issues in Japanese prosody with the minimal use of abstract notions and 2) tests if the Parallel Encoding and Target Approximation (Xu, 2005) framework is suitable for Japanese, a pitch accent language. The first part of the dissertation considers the nature of lexical pitch accent through examining factors that affect the surface F0 realisation of an accent peak (Chapter 2) and establishing the articulatory domain that hosts a tonal target in Japanese (Chapter 3). Next, pitch accent interactions with other communicative functions are considered, specifically in terms of focus (Chapter 4) and sentence type (Chapter 5). Hypotheses using acoustic analyses from the previous Chapters are then verified through analysis-by-synthesis with articulatory synthesisers AMtrainer, PENTAtrainer1, and PENTAtrainer2 (Chapter 6). Chapter 2 provides conclusive evidence of Japanese as a two-tone language as opposed to bearing three underlying tones in its phonology, previously unresolved in existing literature. Proponents of the two-tone hypothesis gather evidence from perception: when stimuli are played in isolation, native listeners can only distinguish two tone levels (High and Low). On the other hand, production evidence reveals robustly three distinct surface F0 levels. Using a series of linear regression analyses, I show the third tone level could be interpreted as a result of pre-low raising, a common articulatory phenomenon. The F0 of an accent peak is inversely correlated with the F0 of the following low target, being an enhanced peak in preparation for the upcoming L. Interpreted together with native listeners’ inability to hear three tones when said in isolation, as repeatedly reported in previous studies, I establish Japanese has only H and L in its tonal inventory. Chapter 3 establishes the syllable as the tone-bearing unit in Japanese tonal articulation. Often described as a mora-timed language, it has been previously unclear whether articulatory tonal targets are hosted in a mora or a syllable in Japanese. When comparing accented words of various syllable structures I found that the F0 accent peak of CVCV wordsoccurs consistently earlier than that of CVn/CVCV words. CVCV words are longer in total duration so its earlier F0 peak is a result of a shorter tone-bearing unit (i.e. two consecutive short morae/syllables). CVn/CVV words on the other hand have a later peak F0 due to hosting an articulatory target as a long syllable, rather than two short morae. I further verified the syllable hypothesis using two articulatory synthesisers, PENTAtrainer1 and PENTAtrainer2. The syllable as a tone-bearing unit incurs fewer predictors but provides better learning accuracy. Chapter 4 explores focus prosody in declarative sentences. Using a newly collected corpus of 6251 sentences that controls for accent condition, focus condition, sentence type, and sentence length, I challenge the widely held idea that post-focus compression of F0 range is accent-independent. Currently it is generally accepted that regardless of the accent condition of the focused word, the excursion size of ‘initial rise’ that marks the beginning of the first word 4 after focus is shrunken. However, confining the notion of post-focus compression to initial-rise (usually extending across only two morae) sets Japanese apart from other languages like English or Mandarin, where such compression is robust across the entire post-focus domain. I show that when F0 range is measured across a wider domain, compression is absent. Where post-focus compression is absent, the F0 trajectory appears to be a result of articulatory carryover effects. This will be interpreted as a result of weak articulatory strength on the post focus domain, explaining the difference in F0 trajectories in long and short utterances. Chapter 5 builds on the previous Chapter to consider in addition the focus prosody in yes/no questions. I investigate what marks a yes/no question, and how focus prosody differs in declarative and interrogative utterances. Acoustic analyses show that questions are marked by a final rise, but the exact shape of such a rise depends on the accent condition of the sentence-final word. When compared to declarative sentences, the key differences in yes/no questions include: a higher F0 level; the absence of post-focus compression even in contexts otherwise observed in statements; and on-focus F0 raising as the only robust focus marker. These findings point to the fact that interrogative focus prosody is not an amalgamation of focus markers and question markers, and bear implication on the representation of Japanese intonation. Chapter 6 verifies observations established thus far through analysis-by-synthesis. I demonstrate comparative modeling as a means to adjudicate between competing theories using PENTAtrainer2, PENTAtrainer1 and AMtrainer. In terms of local fitting accuracy, AMtrainer yielded comparable synthesis accuracy to the PENTAtrainers. Finally, I further demonstrate the compatibility of PENTA with Japanese prosody showing highly accurate F0 predictive analysis (when trained with Chapter 2 production data), and highly satisfactory speaker-dependent synthesis accuracy (when trained with Chapter 4 and 5 sentential data). Naturalness judgment ratings show that the natural stimuli sound as natural as the synthetic stimuli, though questions generally sound less natural than statements. Reasons for this discrepancy are discussed with reference to the design of the stimuli

    Mechanisms of vowel devoicing in Japanese

    Get PDF
    The processes of vowel devoicing in Standard Japanese were examined with respect to the phonetic and phonological environments and the syllable structure of Japanese, in comparison with vowel reduction processes in other languages, in most of which vowel reduction occurs optionally in fast or casual speech. This thesis examined whether Japanese vowel devoicing was a phonetic phenomenon caused by glottal assimilation between a high vowel and its adjacent voiceless consonants, or it was a more phonologically controlled compulsory process. Experimental results showed that Japanese high vowel devoicing must be analysed separately in two devoicing conditions, namely single and consecutive devoicing environments. Devoicing was almost compulsory regardless of the presence of proposed blocking factors such as type of preceding consonant, accentuation, position in an utterance, as long as there was no devoiceable vowel in adjacent morae (single devoicing condition). However, under consecutive devoicing conditions, blocking factors became effective and prevented some devoiceable vowels from becoming voiceless. The effect of speaking rate was also generally minimal in the single devoicing condition, but in the consecutive devoicing condition, the vowels were devoiced more at faster tempi than slower tempi, which created many examples of consecutively devoiced vowels over two morae. Durational observations found that vowel devoicing involves not only phonatory change, but also slight durational reduction. However, the shorter duration of devoiced syllables were adjusted at the word level, so that the whole duration of a word with devoiced vowels remained similar to the word without devoiced vowels, regardless of the number of devoiced vowels in the word. It must be noted that there was no clear-cut distinction between voiced and devoiced vowels, and the phonetic realisation of a devoiced vowel could vary from fully voiced to completely voiceless. A high vowel may be voiced in a typical devoicing environment, but its intensity is significantly weaker than those of vowels in a non-devoicing environment, at all speaking tempi. The mean differences of vowel intensities between these environments were generally higher at faster tempi. The results implied that even when the vowel was voiced, its production process moved in favour of devoicing. However, in consecutive devoicing conditions, this process did not always apply. When some of the devoiceable vowels were devoiced in the consecutive devoicing environment, the intensities of devoiceable vowels were not significantly lower than those of other vowels. The results of intensity measurements of voiced vowels in the devoicing and nondevoicing environments suggested that Japanese vowel devoicing was part of the overall process of complex vowel weakening, and that a completely devoiced vowel was the final state of the weakening process. Japanese vowel devoicing is primarily a process of glottal assimilation, but the results in the consecutive devoicing condition showed that this process was constrained by Japanese syllable structure

    Konsonandikeskne vĂ€ltesĂŒsteem eesti ja inarisaami keeles

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsiooneKolme pikkuskategooriaga konsonandikeskne vĂ€ltesĂŒsteem esineb vĂ€ga vĂ€hestes keeltes, teadaolevalt ainult soome-ugri keeltes: eesti, liivi ja inarisaami keeles ning veel mĂ”ningates saami keeltes. Doktoritöö keskendub neist kahele – eesti ja inarisaami keelele, millest esimene kuulub soome-ugri keelte lÀÀnemeresoome ja teine saami keelterĂŒhma. Eesti keeles esineb keerukas kolmevĂ€ltesĂŒsteem, kus vastandus moodustatakse nii vokaalide, konsonantide kui ka mĂ”lema pĂ”hjal. Inarisaami keeles leiab kolm pikkuskategooriat vaid konsonantide puhul, vokaalidel esineb kahene vastandus. Eksperimentaalfoneetiline vĂ€itekiri uurib, kuidas kolmene konsonandikeskne kvantiteedisĂŒsteem nendes keeltes foneetiliselt avaldub kĂ”netaktis. Vaadeldakse omadusi, mis kolme vĂ€ldet ĂŒksteisest eristavad. Teise suurema teemana kĂ€sitleb doktoritöö segmentaalse konteksti rolli eesti keele konsonandivĂ€lte avaldumisel. Töö tulemused nĂ€itavad, et kolme konsonandikeskset vĂ€ldet eristab nii eesti kui ka inarisaami keeles konsonandi enda kestus, mis on suuremas vĂ€ltes pikem. Keeltevahelised erinevused tulevad vĂ€lja kategooriate omavaheliste kestussuhete kaudu: eesti keeles eristuvad teineteisest rohkem esimene ja teine vĂ€lde, inarisaamis aga teine ja kolmas vĂ€lde. Kui eesti keeles lĂŒheneb konsonandile jĂ€rgnev rĂ”hutu silbi vokaal vastavalt konsonandivĂ€lte kasvades, siis inarisaamis lĂŒhenevad mĂ”lemad, nii konsonandile eelnev rĂ”hulise silbi vokaal kui ka sellele jĂ€rgnev rĂ”hutu silbi vokaal. PĂ”hitoonikontuurid inarisaami eri struktuuriga kahesilbilistes sĂ”nades mĂ€rkimisvÀÀrselt ei varieeru, kuid konsonandivĂ€lte kasvades intensiivsuse vÀÀrtuste erinevus esimese ja teise silbi vokaali vahel suureneb. Samas eri vĂ€ltes oleva vokaalidevahelise helilise konsonandi enda intensiivsus ei muutu. PĂ”hitoon on eesti keeles oluline teise ja kolmanda vĂ€lte eristamisel, kuid klusiilide puhul, kus pĂ”hitooni liikumist jĂ€lgida ei saa, on ka leitud, et vĂ€lte tajumiseks piisab kestuslikest tunnustest. Doktoritöö eesti keele artikulatsioonikatse tulemused nĂ€itavad, et kolmese konsonandikeskse vĂ€lte avaldumisel on oluline osa segmentaalsel kontekstil. Kui mĂ”ningate artikulatoorsete liigutuste puhul saab nĂ€ha vĂ€ltega seotud kolmeseid mustreid (huulte sulgemisliigutuse kestuses konsonandi hÀÀldamisel, keeleliigutuste kestuses ĂŒleminekul konsonandile eelnevalt vokaalilt jĂ€rgnevale), siis erineva sĂ”naalgulise konsonandi ja ĂŒmbritseva vokaalikonteksti tĂ”ttu esineb varieerumist, kus esimene ja teine vĂ€lde vastanduvad kolmandale vĂ”i vastandub esimene vĂ€lde teisele ja kolmandale. Ka spontaankĂ”ne materjali pĂ”hjal tehtud akustiline analĂŒĂŒs nĂ€itas, et erinevate konsonantide puhul realiseerub kolmene vĂ€lde mĂ”nevĂ”rra erinevalt ning sealjuures on oluline seos konsonandi ja seda ĂŒmbritsevate vokaalide omaduste vahel.Quantity systems with three length categories for consonants can be found in a small number of languages, all of which belong to the Finno-Ugric languages: Estonian, Livonian, Inari Saami, and some other Saami languages. The focus of this dissertation is on two of them, Estonian and Inari Saami, the former belonging to the Finnic and the latter to the Saamic branch. Estonian exhibits a complex quantity system forming ternary length categories with vowels, consonants, or combinations of both. In Inari Saami, ternary length distinction is found for consonants, while vocalic quantity shows binary oppositions. This thesis comprises experimental phonetic studies answering two main questions: how is ternary consonantal quantity in Estonian and Inari Saami realized phonetically, and how does quantity interact with segmental context. The results showed that, in both languages, the three-way consonantal quantity is manifested in consonant durations that are longer in higher quantity degrees. While Estonian first and second quantity are further apart from each other, in Inari Saami second and third quantity are more distinct. Cross-linguistic differences also appear in the relations between intervocalic consonants and neighboring vowels. In Estonian, the vowel following the consonant is shorter after a long and overlong consonant than after a short one. Quantity differences in Inari Saami are realized in shorter durations of both vowels in terms of increasing consonantal quantity. Fundamental frequency contours in Inari Saami are roughly the same in words with different structures. Intensity measures, however, show greater differences between the vowels surrounding the consonant when the quantity of the consonant increases. The intensity of the sonorant consonant does not change in different quantities. The results of the articulatory study of this thesis show variation in quantity manifestations in Estonian geminate consonants due to varied segmental context. Some articulatory movements exhibit three-way patterns associated with quantity categories (in the duration of the lip closing gesture for the consonant and tongue transition gesture from the preceding vowel to the following vowel); for others the first and second quantity are opposed to the third quantity or the first quantity degree is opposed to the second and third ones. Similar patters were found in the acoustic data from spontaneous speech. The durational properties of ternary quantity are realized differently for different intervocalic consonants, and variation is also caused by coarticulatory effects of the surrounding vowels.https://www.ester.ee/record=b524109

    Tonal placement in Tashlhiyt: How an intonation system accommodates to adverse phonological environments

    Get PDF
    In most languages, words contain vowels, elements of high intensity with rich harmonic structure, enabling the  perceptual retrieval of pitch. By contrast, in Tashlhiyt, a Berber language, words can be composed entirely of voiceless segments. When an utterance consists of such words, the phonetic opportunity for the execution of intonational pitch movements is exceptionally limited. This book explores in a series of production and perception experiments how these typologically rare phonotactic patterns interact with intonational aspects of linguistic structure. It turns out that Tashlhiyt allows for a tremendously flexible placement of tonal events. Observed intonational structures can be conceived of as different solutions to a functional dilemma: The requirement to realise meaningful pitch movements in certain positions and the extent to which segments lend themselves to a clear manifestation of these pitch movements

    Tonal placement in Tashlhiyt: How an intonation system accommodates to adverse phonological environments

    Get PDF
    In most languages, words contain vowels, elements of high intensity with rich harmonic structure, enabling the  perceptual retrieval of pitch. By contrast, in Tashlhiyt, a Berber language, words can be composed entirely of voiceless segments. When an utterance consists of such words, the phonetic opportunity for the execution of intonational pitch movements is exceptionally limited. This book explores in a series of production and perception experiments how these typologically rare phonotactic patterns interact with intonational aspects of linguistic structure. It turns out that Tashlhiyt allows for a tremendously flexible placement of tonal events. Observed intonational structures can be conceived of as different solutions to a functional dilemma: The requirement to realise meaningful pitch movements in certain positions and the extent to which segments lend themselves to a clear manifestation of these pitch movements

    Tonal placement in Tashlhiyt: How an intonation system accommodates to adverse phonological environments

    Get PDF
    In most languages, words contain vowels, elements of high intensity with rich harmonic structure, enabling the  perceptual retrieval of pitch. By contrast, in Tashlhiyt, a Berber language, words can be composed entirely of voiceless segments. When an utterance consists of such words, the phonetic opportunity for the execution of intonational pitch movements is exceptionally limited. This book explores in a series of production and perception experiments how these typologically rare phonotactic patterns interact with intonational aspects of linguistic structure. It turns out that Tashlhiyt allows for a tremendously flexible placement of tonal events. Observed intonational structures can be conceived of as different solutions to a functional dilemma: The requirement to realise meaningful pitch movements in certain positions and the extent to which segments lend themselves to a clear manifestation of these pitch movements

    Jeddah Arabic intonation : an autosegmental-metrical approach

    Get PDF
    IPhD ThesisThis thesis is a theoretical and instrumental investigation of intonation in Jeddah Arabic, an urban Arabic variety spoken in west Saudi Arabia. The study is carried out in an attempt to establish the dialect’s prosodic properties and to widen the scope and volume of the literature on Arabic prosody that would in turn aid in the cross-dialectal comparison of prosodic and intonational patterns. The investigation is carried out in light of the Auto-Segmental Metrical theory of intonation- a theory that has been reported to account for the intonational patterns of many languages. In AM theory, intonation is manifested via prominent F0 behaviour in interaction with phonological structure, hence maintains a close relationship between accent distribution and phonological/metrical structure. This F0 behaviour is examined acoustically through pitch level, range and excursion size, in the form of increased peak height and excursion, pitch compression or absence thereof to mark intonational structure. In addition to pitch, other acoustic correlates such as duration and amplitude are examined as well. The thesis includes the examination of the different tunes, postlexical phrasing, and accent categories (contour shapes) that occur in the dialect. Moreover, and as an integral part of AM analysis, the thesis closely examines both theoretically and acoustically the concepts of tonal alignment and accentuation and information structure in this Arabic dialect. Data for the study were collected from 20 native male and female speakers of Jeddah Arabic. Data were then semiautomatically segmented and manually transcribed using a modified TOBI system for Arabic. It is found that JA speakers rely on both qualitative and quantitative detail to enhance intonationally important material that is conveyed prosodically. The results also point to that JA is a stress-accent language that is although similar to other languages in this group, contributes differently to the general cross-language prosodic variation. The dialect demonstrates prominent pitch accents that faithfully associate and align with stressed syllables and are distributed in two intonational levels above the prosodic word: the intermediate phrase and the intonational phrase. Those two intonational levels are found to be marked by both tonal and non-tonal correlates. Experimental evidence shows that contrary to the typical reported correlates of those prosodic constituents, in JA intermediate phrases boundaries demonstrate longer pre-boundary units than intonational phrases. This non-tonal pattern in intermediate phrase boundaries correlates with later alignment of the tone with respect to the onset of the stressed syllable

    Tonal alignment in Tokyo Japanese

    Get PDF
    A large amount of evidence for regularities of tonal alignment in various languages has been accumulated recently. However, there is still much disagreement on the characterisation and modelling of these alignment regularities. This thesis investigates tonal alignment in Tokyo Japanese with two objectives. One is to provide a thorough description of tonal alignment in Tokyo Japanese, including a well-known phenomenon, ososagari ('peak delay'); the other is to contribute to the current understanding of tonal alignment, based on empirical data of tonal alignment in Tokyo Japanese. Three speech production experiments were performed. The first experiment examined the alignment of the F0 targets at the beginning of initial-accented words, varying the syllable/mora structures of the accented syllable. The results showed that both the F0 valley and peak were consistently aligned with specific segmental landmarks, and that the alignment of the F0 peak depended on the syllable/mora structure of the accented syllable. The second experiment explored how the alignment patterns found in the first experiment were influenced in different speaking modes; the speaking modes of interest were fast speech rate, raised voice, and local emphasis. The results showed that the orderly alignment behaviour found in the first experiment remained intact irrespective of different speaking modes, although different kinds of small effects were found. The third experiment compared the F0 peak alignment of unaccented and non-initial-accented words to those of initial-accented words. The results of unaccented words demonstrated consistent alignment of the F0 peak with a specific landmark, which is comparable to those of initial-accented words. On the other hand, the results of non-initial-accented words showed earlier alignment of the F0 peak for the pitch accent than those of initial-accented words. The results of the current study as a whole demonstrate consistent alignment of the F0 targets with specific places in the prosodic structure in a language-specific way, which are rather resistant to changes caused by differences of speaking mode. Further durational analyses, together with the alignment data, also suggest that segments and tones are mutually synchronised with each other. These findings provide further evidence that segmental anchoring is a necessary concept in accounting for alignment regularities

    Tonal placement in Tashlhiyt

    Get PDF
    In most languages, words contain vowels, elements of high intensity with rich harmonic structure, enabling the perceptual retrieval of pitch. By contrast, in Tashlhiyt, a Berber language, words can be composed entirely of voiceless segments. When an utterance consists of such words, the phonetic opportunity for the execution of intonational pitch movements is exceptionally limited. This book explores in a series of production and perception experiments how these typologically rare phonotactic patterns interact with intonational aspects of linguistic structure. It turns out that Tashlhiyt allows for a tremendously flexible placement of tonal events. Observed intonational structures can be conceived of as different solutions to a functional dilemma: The requirement to realise meaningful pitch movements in certain positions and the extent to which segments lend themselves to a clear manifestation of these pitch movements
    • 

    corecore