271 research outputs found

    Tonal placement in Tashlhiyt: How an intonation system accommodates to adverse phonological environments

    Get PDF
    In most languages, words contain vowels, elements of high intensity with rich harmonic structure, enabling the  perceptual retrieval of pitch. By contrast, in Tashlhiyt, a Berber language, words can be composed entirely of voiceless segments. When an utterance consists of such words, the phonetic opportunity for the execution of intonational pitch movements is exceptionally limited. This book explores in a series of production and perception experiments how these typologically rare phonotactic patterns interact with intonational aspects of linguistic structure. It turns out that Tashlhiyt allows for a tremendously flexible placement of tonal events. Observed intonational structures can be conceived of as different solutions to a functional dilemma: The requirement to realise meaningful pitch movements in certain positions and the extent to which segments lend themselves to a clear manifestation of these pitch movements

    Tonal placement in Tashlhiyt: How an intonation system accommodates to adverse phonological environments

    Get PDF
    In most languages, words contain vowels, elements of high intensity with rich harmonic structure, enabling the  perceptual retrieval of pitch. By contrast, in Tashlhiyt, a Berber language, words can be composed entirely of voiceless segments. When an utterance consists of such words, the phonetic opportunity for the execution of intonational pitch movements is exceptionally limited. This book explores in a series of production and perception experiments how these typologically rare phonotactic patterns interact with intonational aspects of linguistic structure. It turns out that Tashlhiyt allows for a tremendously flexible placement of tonal events. Observed intonational structures can be conceived of as different solutions to a functional dilemma: The requirement to realise meaningful pitch movements in certain positions and the extent to which segments lend themselves to a clear manifestation of these pitch movements

    Tonal placement in Tashlhiyt: How an intonation system accommodates to adverse phonological environments

    Get PDF
    In most languages, words contain vowels, elements of high intensity with rich harmonic structure, enabling the  perceptual retrieval of pitch. By contrast, in Tashlhiyt, a Berber language, words can be composed entirely of voiceless segments. When an utterance consists of such words, the phonetic opportunity for the execution of intonational pitch movements is exceptionally limited. This book explores in a series of production and perception experiments how these typologically rare phonotactic patterns interact with intonational aspects of linguistic structure. It turns out that Tashlhiyt allows for a tremendously flexible placement of tonal events. Observed intonational structures can be conceived of as different solutions to a functional dilemma: The requirement to realise meaningful pitch movements in certain positions and the extent to which segments lend themselves to a clear manifestation of these pitch movements

    Investigating the build-up of precedence effect using reflection masking

    Get PDF
    The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels

    The dynamics of Japanese prosody

    Get PDF
    This dissertation explores aspects of Tokyo Japanese (Japanese henceforth) prosody through acoustic analysis and analysis-by-synthesis. It 1) revisits existing issues in Japanese prosody with the minimal use of abstract notions and 2) tests if the Parallel Encoding and Target Approximation (Xu, 2005) framework is suitable for Japanese, a pitch accent language. The first part of the dissertation considers the nature of lexical pitch accent through examining factors that affect the surface F0 realisation of an accent peak (Chapter 2) and establishing the articulatory domain that hosts a tonal target in Japanese (Chapter 3). Next, pitch accent interactions with other communicative functions are considered, specifically in terms of focus (Chapter 4) and sentence type (Chapter 5). Hypotheses using acoustic analyses from the previous Chapters are then verified through analysis-by-synthesis with articulatory synthesisers AMtrainer, PENTAtrainer1, and PENTAtrainer2 (Chapter 6). Chapter 2 provides conclusive evidence of Japanese as a two-tone language as opposed to bearing three underlying tones in its phonology, previously unresolved in existing literature. Proponents of the two-tone hypothesis gather evidence from perception: when stimuli are played in isolation, native listeners can only distinguish two tone levels (High and Low). On the other hand, production evidence reveals robustly three distinct surface F0 levels. Using a series of linear regression analyses, I show the third tone level could be interpreted as a result of pre-low raising, a common articulatory phenomenon. The F0 of an accent peak is inversely correlated with the F0 of the following low target, being an enhanced peak in preparation for the upcoming L. Interpreted together with native listeners’ inability to hear three tones when said in isolation, as repeatedly reported in previous studies, I establish Japanese has only H and L in its tonal inventory. Chapter 3 establishes the syllable as the tone-bearing unit in Japanese tonal articulation. Often described as a mora-timed language, it has been previously unclear whether articulatory tonal targets are hosted in a mora or a syllable in Japanese. When comparing accented words of various syllable structures I found that the F0 accent peak of CVCV wordsoccurs consistently earlier than that of CVn/CVCV words. CVCV words are longer in total duration so its earlier F0 peak is a result of a shorter tone-bearing unit (i.e. two consecutive short morae/syllables). CVn/CVV words on the other hand have a later peak F0 due to hosting an articulatory target as a long syllable, rather than two short morae. I further verified the syllable hypothesis using two articulatory synthesisers, PENTAtrainer1 and PENTAtrainer2. The syllable as a tone-bearing unit incurs fewer predictors but provides better learning accuracy. Chapter 4 explores focus prosody in declarative sentences. Using a newly collected corpus of 6251 sentences that controls for accent condition, focus condition, sentence type, and sentence length, I challenge the widely held idea that post-focus compression of F0 range is accent-independent. Currently it is generally accepted that regardless of the accent condition of the focused word, the excursion size of ‘initial rise’ that marks the beginning of the first word 4 after focus is shrunken. However, confining the notion of post-focus compression to initial-rise (usually extending across only two morae) sets Japanese apart from other languages like English or Mandarin, where such compression is robust across the entire post-focus domain. I show that when F0 range is measured across a wider domain, compression is absent. Where post-focus compression is absent, the F0 trajectory appears to be a result of articulatory carryover effects. This will be interpreted as a result of weak articulatory strength on the post focus domain, explaining the difference in F0 trajectories in long and short utterances. Chapter 5 builds on the previous Chapter to consider in addition the focus prosody in yes/no questions. I investigate what marks a yes/no question, and how focus prosody differs in declarative and interrogative utterances. Acoustic analyses show that questions are marked by a final rise, but the exact shape of such a rise depends on the accent condition of the sentence-final word. When compared to declarative sentences, the key differences in yes/no questions include: a higher F0 level; the absence of post-focus compression even in contexts otherwise observed in statements; and on-focus F0 raising as the only robust focus marker. These findings point to the fact that interrogative focus prosody is not an amalgamation of focus markers and question markers, and bear implication on the representation of Japanese intonation. Chapter 6 verifies observations established thus far through analysis-by-synthesis. I demonstrate comparative modeling as a means to adjudicate between competing theories using PENTAtrainer2, PENTAtrainer1 and AMtrainer. In terms of local fitting accuracy, AMtrainer yielded comparable synthesis accuracy to the PENTAtrainers. Finally, I further demonstrate the compatibility of PENTA with Japanese prosody showing highly accurate F0 predictive analysis (when trained with Chapter 2 production data), and highly satisfactory speaker-dependent synthesis accuracy (when trained with Chapter 4 and 5 sentential data). Naturalness judgment ratings show that the natural stimuli sound as natural as the synthetic stimuli, though questions generally sound less natural than statements. Reasons for this discrepancy are discussed with reference to the design of the stimuli

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    The Pitch Range of Italians and Americans. A Comparative Study

    Get PDF
    Linguistic experiments have investigated the nature of F0 span and level in cross-linguistic comparisons. However, only few studies have focused on the elaboration of a general-agreed methodology that may provide a unifying approach to the analysis of pitch range (Ladd, 1996; Patterson and Ladd, 1999; Daly and Warren, 2001; Bishop and Keating, 2010; Mennen et al. 2012). Pitch variation is used in different languages to convey different linguistic and paralinguistic meanings that may range from the expression of sentence modality to the marking of emotional and attitudinal nuances (Grice and Baumann, 2007). A number of factors have to be taken into consideration when determining the existence of measurable and reliable differences in pitch values. Daly and Warren (2001) demonstrated the importance of some independent variables such as language, age, body size, speaker sex (female vs. male), socio-cultural background, regional accents, speech task (read sentences vs. spontaneous dialogues), sentence type (questions vs. statements) and measure scales (Hertz, semitones, ERB etc.). Coherently with the model proposed by Mennen et al. (2012), my analysis of pitch range is based on the investigation of LTD (long-term distributional) and linguistic measures. LTD measures deal with the F0 distribution within a speaker’s contour (e.g. F0 minimum, F0 maximum, F0 mean, F0 median, standard deviation, F0 span) while linguistic measures are linked to specific targets within the contour, such as peaks and valleys (e.g. high and low landmarks) and preserve the temporal sequences of pitch contours. This investigation analyzed the characteristics of pitch range production and perception in English sentences uttered by Americans and Italians. Four experiments were conducted to examine different phenomena: i) the contrast between measures of F0 level and span in utterances produced by Americans and Italians (experiments 1-2); ii) the contrast between the pitch range produced by males and females in L1 and L2 (experiment 1); iii) the F0 patterns in different sentence types, that is, yes-no questions, wh-questions, and exclamations (experiment 2); iv) listeners’ evaluations of pitch span in terms of ±interesting, ±excited, ±credible, ±friendly ratings of different sentence types (experiments 3-4); v) the correlation between pitch span of the sentences and the evaluations given by American and Italian listeners (experiment 3); vi) the listeners’ evaluations of pitch span values in manipulated stimuli, whose F0 span was re-synthesized under three conditions: narrow span, original span, and wide span (experiment 4); vii) the different evaluations given to the sentences by male and female listeners. The results of this investigation supported the following generalizations. First, pitch span more than level was found to be a cue for non-nativeness, because L2 speakers of English used a narrower span, compared to the native norm. What is more, the experimental data in the production studies indicated that the mode of sentences was better captured by F0 span than level. Second, the Italian learners of English were influenced by their L1 and transferred L1 pitch range variation into their L2. The English sentences produced by the Italians had overall higher pitch levels and narrower pitch span than those produced by the Americans. In addition, the Italians used overall higher pitch levels when speaking Italian and lower levels when speaking English. Conversely, their pitch span was generally higher in English and lower in Italian. When comparing productions in English, the Italian females used higher F0 levels than the American females; vice versa, the Italian males showed slightly lower F0 levels than the American males. Third, there was a systematic relation between pitch span values and the listeners’ evaluations of the sentences. The two groups of listeners (the Americans and the Italians) rated the stimuli with larger pitch span as more interesting, exciting and credible than the stimuli with narrower pitch span. Thus, the listeners relied on the perceived pitch span to differentiate among the stimuli. Fourth, both the American and the Italian speakers were considered more friendly when the pitch span of their sentences was widened (wide span manipulation) and less friendly when the pitch span was narrowed (narrow span manipulation). This happened in all the stimuli regardless of the native language of the speakers (American vs. Italian)

    Correlates of linguistic rhythm in the speech signal

    Get PDF
    Spoken languages have been classified by linguists according to their rhythmic properties, and psycholinguists have relied on this classification to account for infants’ capacity to discriminate languages. Although researchers have measured many speech signal properties, they have failed to identify reliable acoustic characteristics for language classes. This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support the notion of rhythm classes and also allow the simulation of infant language discrimination, consistent with the hypothesis that newborns rely on a coarse segmentation of speech. A hypothesis is proposed regarding the role of rhythm perception in language acquisition

    Lexical and postlexical prominence in Tashlhiyt Berber and Moroccan Arabic

    Get PDF
    Tashlhiyt Berber (Afro-Asiatic, Berber) and Moroccan Arabic (Afro-Asiatic, Semitic), two languages spoken in Morocco, have been in contact for over 1200 years. The influence of Berber languages on the lexicon and the segmental-phonological structure of Moroccan Arabic is well-documented, whereas possible similarities in the prosodic-phonological domain have not yet been addressed in detail. This thesis brings together evidence from production and perception to bear on the question whether Tashlhiyt Berber and Moroccan Arabic also exhibit convergence in the domain of phonological prominence. Experimental results are interpreted as showing that neither language has lexical prominence asymmetries in the form of lexical stress. This lack of stress in Moroccan Arabic is unlike the undisputed presence of lexical stress in most other varieties of Arabic, which in turn suggests that this aspect of the phonology of Moroccan Arabic has resulted from contact with (Tashlhiyt) Berber. A further, theoretical contribution is made with respect to the possible correspondence between lexical and postlexical prominence structure from a typological point of view. One of the tenets of the Autosegmental Metrical approach to intonation analysis holds that prominence-marking intonational events (pitch accents) associate with lexically stressed syllables. Exactly how prominence marking is achieved in languages that lack lexical stress is little-understood, and this thesis' discussion of postlexical prominence in Tashlhiyt Berber and Moroccan Arabic provides new insights that bear on this topic. A first set of production experiments investigates, for both languages, if there are acoustic correlates to what some researchers have considered to be lexically stressed syllables. It is shown that neither language exhibits consistent acoustic enhancement of presumed stressed syllables relative to unstressed syllables. The second set of production experiments reports on the prosodic characteristics of question word interrogatives in both languages. It is shown that question words are the locus of postlexical prominence-marking events that however do not exhibit association to a sub-lexical phonological unit. A final perception experiment serves the goal of showing how native speakers of Tashlhiyt Berber and Moroccan Arabic deal with the encoding of a postlexical prominence contrast that is parasitic on a lexical prominence contrast. This is achieved by means of a 'stress deafness' experiment, the results of which show that speakers of neither language can reliably encode a lexically-specified prominence difference. Results from all three types of experiment thus converge in suggesting that lexical prominence asymmetries are not specified in the phonology of either language
    • 

    corecore