25 research outputs found

    Parameterization of F0 register and discontinuity to predict prosodic boundary strength in Hungarian spontaneous speech

    Get PDF
    This study addresses the questions how to parameterize (1) aspects of fundamental frequency (F0) register, i.e. time-varying F0 level and range within prosodic phrases and (2) F0 discontinuities at prosodic boundaries in order to predict perceived prosodic boundary strength in Hungarian spontaneous speech. For F0 register stylization we propose a new fitting procedure for base-, mid-, and toplines that does not require error-prone local peak and valley detection and is robust against disturbing influences of high pitch accents and boundary tones. From these linear stylizations we extracted features which reflect F0 boundary discontinuities and fitted stepwise linear regression and regression tree models to predict perceived boundary strength. In a ten-fold cross-validation the mean correlation between predictions and human judgments amounts up to 0.8

    Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian

    Get PDF
    We examined how well prosodic boundary strength can be captured by two declination stylization methods as well as by four different representations of pitch register. In the stylization proposed by Liebermann et al. (1985) base- and topline are fitted to peaks and valleys of the pitch contour, whereas in Reichel&Mády (2013) these lines are fitted to medians below and above certain pitch percentiles. From each of the stylizations four feature pools were induced representing different aspects of register discontinuity at word boundaries: discontinuities related to the base-, mid-, and topline, as well as to the range between base- and topline. Concerning stylization the median-based fitting approach turned out to be more robust with respect to declination line crossing errors and yielded base-, topline and range-related discontinuity characteristics with higher correlations to perceived boundary strength. Concerning register representation, for the peak/valley fitting approach the base- and topline patterns showed weaker correspondences to boundary strength than the other feature pools. We furthermore trained generalized linear regression models for boundary strength prediction on each feature pool. It turned out that neither the stylization method nor the register representation had a significant influence on the overall good prediction performance

    Prosodic Phrasing in Spontaneous Swedish

    Get PDF
    One of the most important functions of prosody is to divide the flow of speech into chunks. The chunking, or prosodic phrasing, of speech plays an important role in both the production and perception of speech. This study represents a move away from the laboratory speech examined in previous, related studies on prosodic phrasing in Swedish, since a spontaneous, Southern Swedish speech material is investigated. The study is, however, not primarily intended as a study of the Southern Swedish dialect; rather Southern Swedish is used as a convenient object on which to test various hypotheses about the phrasing function of prosody in spontaneous speech. The study comprises both analyses of production data and perception experiments, and both the phonetics and phonology of prosodic phrasing is dealt with. First, the distribution of prosodic phrase boundaries in spontaneous speech is examined by considering it as a reflection of optimality theoretic constraints that restrain the production and perception of speech. Secondly, the phonetic realization of prosodic phrase boundaries is investigated in a study on articulation rate changes within the prosodic phrase. Evidence of phrase-final lengthening, a reduction of the articulation rate in the final part of the prosodic phrase, is found. The tonal means used to signal coherence within the prosodic phrase is subsequently investigated. An attempt is made to test the two Lund intonation models’ capacities for describing spontaneous speech. The two approaches have different implications for the amount of preplanning needed, which makes them particularly interesting to compare by testing spontaneous data. The results indicate that no or little preplanning is needed to produce tonally coherent phrases. No evidence is found to suggest e.g. that speakers accommodate for the length of the upcoming phrase by starting longer phrases with a higher F0 than short phrases. An explanation is sought for variation in F0 starting points found in the data despite F0’s insensitivity to phrase length. It is concluded that F0 is used to signal coherence even across prosodic phrase boundaries. It is furthermore found that tonal coherence signals are used to override strong boundary signals in spontaneous speech, thereby making initially unplanned additions possible. Finally, the perception of boundary strength is examined in two perception experiments. Listeners are found to agree well in their perceptual judgments of boundary strength, and it is shown that the main correlate to perceived boundary strength in spontaneous speech is pause length. The useful distinction between weak, prosodic phrase boundaries and strong, prosodic utterance boundaries in descriptions of read speech is found to be inappropriate for descriptions of spontaneous speech. It fails to capture the conflicting local and global signals of boundary strength and coherence that arise when strong boundary signals are overriden by coherence signals. The possibility to use conflicting signals in this way is seen as an important asset to the speaker as it makes changes in the speech plan possible, and it is regarded to be a characteristic of prosodic phrasing in spontaneous speech

    Interaction features for prediction of perceptual segmentation:Effects of musicianship and experimental task

    Get PDF
    As music unfolds in time, structure is recognised and understood by listeners, regardless of their level of musical expertise. A number of studies have found spectral and tonal changes to quite successfully model boundaries between structural sections. However, the effects of musical expertise and experimental task on computational modelling of structure are not yet well understood. These issues need to be addressed to better understand how listeners perceive the structure of music and to improve automatic segmentation algorithms. In this study, computational prediction of segmentation by listeners was investigated for six musical stimuli via a real-time task and an annotation (non real-time) task. The proposed approach involved computation of novelty curve interaction features and a prediction model of perceptual segmentation boundary density. We found that, compared to non-musicians’, musicians’ segmentation yielded lower prediction rates, and involved more features for prediction, particularly more interaction features; also non-musicians required a larger time shift for optimal segmentation modelling. Prediction of the annotation task exhibited higher rates, and involved more musical features than for the real-time task; in addition, the real-time task required time shifting of the segmentation data for its optimal modelling. We also found that annotation task models that were weighted according to boundary strength ratings exhibited improvements in segmentation prediction rates and involved more interaction features. In sum, musical training and experimental task seem to have an impact on prediction rates and on musical features involved in novelty-based segmentation models. Musical training is associated with higher presence of schematic knowledge, attention to more dimensions of musical change and more levels of the structural hierarchy, and higher speed of musical structure processing. Real-time segmentation is linked with higher response delays, less levels of structural hierarchy attended and higher data noisiness than annotation segmentation. In addition, boundary strength weighting of density was associated with more emphasis given to stark musical changes and to clearer representation of a hierarchy involving high-dimensional musical changes.peerReviewe

    Consistency in transcription and labelling of German intonation with GToBI

    Get PDF
    A diverse set of speech data was labelled in three sites by 13 transcribers with differing levels of expertise, using GToBI, a consensus transcription system for German intonation. Overall inter-transcriber -consistency suggests that, with training, labellers can acquire sufficient skill with GToBI for large-scale database labelling. 1

    The prosody of correction and contrast

    Get PDF
    In the extensive literature on the prosodic expression of Information Structure (IS) the notion of contrast is typically coarse grained and subsumed under relational dichotomies like the theme-rheme or topic-focus, or as an inherent feature of focus, evoking a set of alternatives. This paper has two goals. First, we advocate for a more nuanced conception of contrast. This distinguishes between the “alternatives” based meaning of contrast on one hand and correction on the other, which is a more discourse-oriented meaning that encodes the speaker's assumptions about the hearer's beliefs. Second, we present experimental evidence that among the pragmatic types of contrast examined, only correction receives distinct prosodic marking, which cuts across the traditional IS topic-focus division and is realized in the same way in focus and topic constituents

    The interaction of pitch and timing in the perception of prosodic grouping

    Get PDF
    Speakers break their otherwise continuous speech stream into meaningful segments, the edges of which are marked by audible cues such as pauses, rate changes and pitch movement. Prosodic boundaries, as these segment edges and the cues marking them are known, play a role critical to language processing and spoken language acquisition. While great progress has been made in quantifying the complicated range of acoustic cues that mark boundaries, little is understood about the cognitive processes by which these cues guide linguistic interpretation. Further, while prosodic boundary measures typically treat critical cues from pitch and timing independently, evidence suggests that pitch and timing are perceptually interdependent. In fact, pitch factors may at times distort perceived duration. This dissertation presents 3 pairs of perception experiments investigating pitch-­time interaction, including putative distortion of perceived duration from dynamic pitch and cross-‑silence pitch jumps (i.e., the kappa effect). Each pair uses the same set of stimuli, resynthesized with crossed continua of pitch and timing manipulations, in two different tasks: one psychoacoustic judgment of duration, and one of linguistic interpretation. Results suggest that perceptual interaction of major cues from timing (preboundary lengthening and pauses) and pitch (edge tones and reset) can be analyzed as reflecting gestalt-­like grouping principles (proximity, similarity and continuity) that have been shown to play a role in perceptual grouping in other cognitive domains, including vision and non-speech auditory perception. In addition to these potentially more cognitive­‐general principles, a new role is introduced for learned and potentially language-­specific patterns to prosodic grouping, in particular intonational schemas, i.e., recognizable cross-­phrase pitch patterns. Beyond this, results also support the hypothesis that perceived grouping is the driving force behind several types of pitch­based auditory illusions, including the auditory kappa effect. This dissertation offers insights into why prosodic boundaries are expressed with the particular pitch and timing cues that are common cross-­linguistically. While much language form is arbitrary, the expression of grouping by way of acoustic cues appears to be much less so. This research has potential toexplain the perceptual foundations of boundary cues, and therefore the cross-­linguistic similarities of prosodic grouping cues

    Language-specificity in the perception of continuation intonation

    Get PDF
    This paper addressed the question of how British English, German and Dutch listeners differ in their perception of continuation intonation both at the phonological level (Experiment 1) and at the level of phonetic implementation (Experiment 2). In Experiment 1, preference scores of pitch contours to signal continuation at the clause-boundary were obtained from these listener groups. It was found that among contours with H%, British English listeners had a strong preference for H*L H%, as predicted. Unexpectedly, British English listeners rated H* H% noticeably more favourably than L*H H%; Dutch listeners largely rated H* H% more favourably than H*L H% and L*H H%; German listeners rated these contours similarly and seemed to have a slight preference for H*L H%. In Experiment 2, the degree to which a final rise was perceived to express continuation was established for each listener group in a made-up language. It was found that although all listener groups associated a higher end pitch with a higher degree of continuation likelihood, the perceived meaning difference for a given interval of end pitch heights varied with the contour shape of the utterance final syllable. When it was comparable to H* H%, British English and Dutch listeners perceived a larger meaning difference than German listeners; when it was comparable to H*L H%, British English listeners perceived a larger difference than German and Dutch listeners. This shows that language-specificity in continuation intonation at the phonological level affects the perception of continuation intonation at the phonetic level
    corecore