51 research outputs found

    Production, perception and online processing of prominence in the post-focal domain

    Get PDF
    This dissertation presents a fundamentally new and in-depth investigation of the distribution of prominence in different focal structures in two varieties of Italian (the one spoken in Udine and the one spoken in Bari), by means of the implementation of a categorical analysis with the continuous prosodic parameters related to F0 and periodic energy. Results provide evidence of the fact that prominence in these varieties of Italian is conveyed by both a categorical three-way distinction and a gradual modulation: absence or presence of pitch movement in the distinction between background (post-focal position) and the focal conditions, and a gradual modification of energy and duration. The degree of prominence of words occurring in different focal structures was also investigated in perception. The reportedly different distribution of prominence found in questions for the variety of Italian spoken in Bari is shown to have an influence in the degree of perceived prominence. This influence is found in the comparison between prominence’s ratings of Bari and Udine native speakers, as well as of Bari native speakers and German native speakers, with Italian as L2. Furthermore, the present dissertation tests the real-time processing of the pitch excursion registered in the post-focal region of questions in the Bari variety. Findings confirmed that the fine-grained changes in prominence are processed in real time. Moreover, results indicate that top-down expectations play a crucial role in modulating general cognitive processes. Overall, this thesis supports the view of prosodic prominence as characterised by a bundle of cues, probabilistically distributed in the listener’s perceptual space, which form top-down expectations that play a role both in offline perception and in online processing. Signal-based factors also play a role in perception and online processing, but can however be overridden by expectations

    Perceiving focus

    Get PDF

    On the pragmatic and semantic functions of Estonian sentence prosody

    Get PDF
    The goal of the dissertation was to investigate intonational correlates of information structure in a free word order language, Estonian. Information-structural categories such as focus or givenness are expressed by different grammatical means (e.g. pronoun, presence of accent, word order etc.) in different languages of the world (Chafe, 1976; 1987; Prince, 1981; 1992; Lambrecht, 1994; Gundel, 1999). The main cue of focus in intonation languages (e.g. English and German) is pitch accent (Halliday, 1967a; Ladd, 2008). In free word order languages, information structure affects the position of words in a sentence (É. Kiss, 1995) and sometimes it is even implied that word order in a free word order language might function like pitch accent in an intonation language (Lambrecht 1994: 240). The study reports on perception and production experiments on the effects of focus and givenness on Estonian sentence intonation. The aim of the experiments was to establish whether information structure has tonal correlates in Estonian, and if so, whether information structure or word order interacts more strongly with sentence intonation. A perception experiment showed that L1-Estonian listeners perceive pitch prominence as focus and accent shift as a change of sentence focus. A speech production study showed congruently that L1-Estonian speakers do use accent shift, and mark sentence focus with pitch accent. Another speech production experiment demonstrated that there is no phonetic difference between new information focus (e.g. “What did Lena draw?” – “Lena drew a whale.”) and corrective focus (e.g. “Lena drew a lion.” – “No! She drew a whale”). The last experiment showed that given information is signalled with varying F0 range, if followed by focus, but without a pitch accent, if preceded by focus. All the experiments revealed that word order has a weak influence on sentence intonation. Sentence intonation interacts with focus and givenness in Estonian. As a conclusion, it is suggested that the pragmatic functions of word order, which apparently can be overridden by focus interpretation, are slightly different from the functions of pitch accent

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    On the pragmatic and semantic functions of Estonian sentence prosody

    Get PDF
    The goal of the dissertation was to investigate intonational correlates of information structure in a free word order language, Estonian. Information-structural categories such as focus or givenness are expressed by different grammatical means (e.g. pronoun, presence of accent, word order etc.) in different languages of the world (Chafe, 1976; 1987; Prince, 1981; 1992; Lambrecht, 1994; Gundel, 1999). The main cue of focus in intonation languages (e.g. English and German) is pitch accent (Halliday, 1967a; Ladd, 2008). In free word order languages, information structure affects the position of words in a sentence (É. Kiss, 1995) and sometimes it is even implied that word order in a free word order language might function like pitch accent in an intonation language (Lambrecht 1994: 240). The study reports on perception and production experiments on the effects of focus and givenness on Estonian sentence intonation. The aim of the experiments was to establish whether information structure has tonal correlates in Estonian, and if so, whether information structure or word order interacts more strongly with sentence intonation. A perception experiment showed that L1-Estonian listeners perceive pitch prominence as focus and accent shift as a change of sentence focus. A speech production study showed congruently that L1-Estonian speakers do use accent shift, and mark sentence focus with pitch accent. Another speech production experiment demonstrated that there is no phonetic difference between new information focus (e.g. “What did Lena draw?” – “Lena drew a whale.”) and corrective focus (e.g. “Lena drew a lion.” – “No! She drew a whale”). The last experiment showed that given information is signalled with varying F0 range, if followed by focus, but without a pitch accent, if preceded by focus. All the experiments revealed that word order has a weak influence on sentence intonation. Sentence intonation interacts with focus and givenness in Estonian. As a conclusion, it is suggested that the pragmatic functions of word order, which apparently can be overridden by focus interpretation, are slightly different from the functions of pitch accent

    The Phonetic Realization of Narrow Focus in English L1 and L2. Data from Production and Perception

    Get PDF
    The typological differences between the two languages are reflected in the strategies adopted to mark sentence-level prominence. While English mark focus by modulating prosodic parameters (namely, pitch, duration and intensity), Italian normally recurs to word order strategies, benefitting from the freer word order admitted by its syntax. This study is aimed to investigate the acquisition of the prosodic marking of narrow non-contrastive focus by Italian speakers of English L2. This study was mainly aimed at: (a) determining and comparing the prosodic cues used by English native speakers and Italian speakers of English L2 when marking narrow focus; (b) verifying if the Italian speakers are able to acquire the English prosodic strategies in focus marking as a function of their competence in English, progressively avoiding the focus marking strategies that characterize their L1 in favor of more native-like solutions; (c) investigating the phenomenon not only at the production level, but also from the point of view of perception. Consequently, this work is composed by a production and a perception study. The production study consisted in the acoustic analysis of native and non-native productions. The speech data were collected using a semi-spontaneous method, where speakers recorded a set of short sentences as replies to wh- questions, with the aim of eliciting sentences presenting narrow focus on subject or on verb. Three groups of speakers were recorded: English native speakers NS), Italian native speakers with a higher competence in English L2 (NNS1), and Italian native speakers with a lower competence in English L2 (NNS2). A similar set of Italian L1 sentences was also elicited from the Italian speakers. The acoustical analysis was performed at sentence and word level, and it was mainly based on the measurement of fundamental frequency and duration. The results confirmed that English native speakers mark narrow focus mainly by modulating pitch. NNS1 showed a progress towards the target model, by implementing an active use of pitch, although not perfectly matching with the native one. Finally, NNS2 were not able to mark focus with the use of prosodic parameters. The analysis of the Italian L1 data set suggested that in Italian narrow non-contrastive focus is not marked prosodically. Not even duration, which in Italian is the prosodic cue normally used to mark prominence at word level seems to play a role in signaling prominence at sentence level. The perception study was designed to verify whether the differences shown by the acoustical measurements could also have an impact on the listeners' perception. Two perception tests were designed, based on a two-alternative forced-choice paradigm, where listeners were asked to identify narrow focus by guessing the wh- question that had triggered each sentence. Experiment 1 presented natural sentences to two groups of listeners: 22 British native speakers and 22 Italian native listeners. The Italian native listeners were also presented with an extra set of stimuli, consisting of the Italian L1 data set. The results of Experiment 1 showed that English native listeners could correctly identify narrow focus even without extra contextual information. This happened for NS and NNS1, whereas the listeners could not recognize focus in the productions by NNS2. The Italian listeners could also detect focus well above chance level in the productions by NS. However, they failed to identify focus in the productions by NNS1 and NNS2. As for the Italian L1 data set, the Italian listeners failed to distinguish narrow focus, providing perceptual evidence to the hypothesis that Italians do not mark narrow focus by prosody. Experiment 2 was designed to investigate the effect of the differences in pitch modulation on the correct detection of narrow focus by English native listeners. In this case, the productions of the speakers were acoustically manipulated. The participants were 20 British English native speakers. In general, the results of Experiment 2 confirmed that pitch plays an important role in the recognition of narrow focus also from the perceptual point of view. This is particularly true for NS productions, while the listeners could not successfully identify focus in the modified non-native productions. The results of the production study and the perception study converged in showing that in English pitch plays an important role in the production and perception of narrow non-contrastive focus. As for non-native productions, NNS1 could approach the native model to a certain extent by modulating "FO". From the perceptual point of view, their productions were effective enough to be successfully understood by English native listeners. In contrast, NNS2 had not managed to adopt the strategies of English, showing a poor prosodic characterization of the constituent in focus. As a consequence, the listeners could not identify focus in the NNS2 productions. These findings are particularly interesting not only for research in L2 phonetics, but also for their implications for language instruction, where prosody has only recently started to be studied and taught with renewed interest and momentum

    Towards text-based prediction of phrasal prominence

    Get PDF
    The objective of this thesis was text-based prediction of phrasal prominence. Improving natural sounding speech synthesis motivated the task, because phrasal prominence, which depicts the relative saliency of words within a phrase, is a natural part of spoken language. Following the majority of previous research, prominence is predicted on binary level derived from a symbolic representation of pitch movements. In practice, new classifiers and new models from different fields of natural language processing were explored. Applicability of spatial and graph-based language models was tested by proposing such features as word vectors, a high-dimensional vector-space representation, and DegExt, a keyword weighting method. Support vector machines (SVMs) were used due to their widespread suitability to supervised classification tasks with high-dimensional continuous-valued input. Linear inner product and non-linear radial basis function (RBF) were used as kernels. Furthermore, hidden Markov support vector machines (HM-SVMs) were evaluated to investigate benefits of sequential classification. The experiments on the widely used Boston University Radio News Corpus (BURNC) were successful in two major ways: Firstly, the non-linear support vector machine along with the best performing features achieved similar performance than the previous state-of-the-art approach reported by Rangarajan et al. [RNB06]. Secondly, newly proposed features based on word vectors moderately outperformed part-of-speech tags, which has been inevitably the best performing feature throughout the research of text-based prominence prediction

    It´s all about the rhythm - A neurocognitive approach towards the Rhythm Rule in German and English

    Get PDF
    The aim of the present doctoral thesis is to gain deeper insight into the cognitive processing of rhythmically irregular structures in form of stress clashes and stress lapses in comparison to structures that follow the Rhythm Rule. Although stress clashes and stress lapses are allowed and hence present in speech, they are nonetheless marked as rhythmically ill-formed. Hence, since rhythmically induced stress shifts appear often in languages like German, and especially English, it was decided to investigate how the brain reacts to structures that do not meet with rhythmic expectations but are allowed in the investigated language. In this respect, this rhythmic phenomenon differs from the rhythmic deviation types that have been investigated to date. Four studies comprising five experiments using the ERP technique were conducted within the scope of the present thesis. In order to support and complement the findings of the ERP studies, an additional production and perception study and two reaction time studies were designed and undertaken on German rhythmic irregularities. Three ERP studies were conducted on the cognitive processing of rhythmic irregularities in German phrases and compounds. Due to the given task settings in the ERP studies, measured reaction times were not meaningful. Therefore, independent reaction time studies with the identical set of stimuli were performed and are reported with the corresponding ERP studies. Based on the findings of the first ERP experiment on German phrases, a follow-up study was conducted in which the sensitivity towards attentional and contextual influences was further tested by using modified task settings and adjusted stimuli presentation modalities. The study on German compounds consists of two experiments which tried to shed further light on the task-sensitivity of the ERP components found in the studies on German phrases. A further ERP study was set up in order to compare the influence of the RR on processing in German and English by using similar deviations in English. Therefore, English compounds were tested either obeying or deviating from this rule. Moreover, due to the aforementioned syntactic differences between stress shift targets in German and English, this study allowed for a combined yet disentangled investigation of rhythmical and lexical influences on speech processing. In previous research, the application of the RR in speech production was mainly investigated on English data and exclusively in compound structures in German. Therefore, an additional production and perception study was used as a pre-test for the planned ERP studies on German. Investigating the application and perception of the RR should deliver further insights into its importance in German not only on the word level (in compounds) but also on the phrasal level and therefore complement and extend the findings of previous studies

    Identifying prosodic prominence patterns for English text-to-speech synthesis

    Get PDF
    This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word informativeness and salience, on prosodic prominence. To improve prosodic prominence prediction we first follow up the classic approach in which prosodic prominence patterns are flattened into binary sequences of pitch accented and pitch unaccented words. We propose and motivate statistic and syntactic dependency based features that are complementary to the most predictive features proposed in previous works on automatic pitch accent prediction and show their utility on both read and spontaneous speech. Different accentuation patterns can be associated to the same sentence. Such variability rises the question on how evaluating pitch accent predictors when more patterns are allowed. We carry out a study on prosodic symbols variability on a speech corpus where different speakers read the same text and propose an information-theoretic definition of optionality of symbolic prosodic events that leads to a novel evaluation metric in which prosodic variability is incorporated as a factor affecting prediction accuracy. We additionally propose a method to take advantage of the optionality of prosodic events in unit-selection speech synthesis. To better account for the tight links between the prosodic prominence of a word and the discourse/sentence context, part of this thesis goes beyond the accent/no-accent dichotomy and is devoted to a novel task, the automatic detection of contrast, where contrast is meant as a (Information Structure’s) relation that ties two words that explicitly contrast with each other. This task is mainly motivated by the fact that contrastive words tend to be prosodically marked with particularly prominent pitch accents. The identification of contrastive word pairs is achieved by combining lexical information, syntactic information (which mainly aims to identify the syntactic parallelism that often activates contrast) and semantic information (mainly drawn from the Word- Net semantic lexicon), within a Support Vector Machines classifier. Once we have identified patterns of prosodic prominence we propose methods to incorporate such information in TTS synthesis and test its impact on synthetic speech naturalness trough some large scale perceptual experiments. The results of these experiments cast some doubts on the utility of a simple accent/no-accent distinction in Hidden Markov Model based speech synthesis while highlight the importance of contrastive accents
    corecore