35 research outputs found

    Phonetic Dimensions of Intonational Categories - the case of L+H* and H*

    Get PDF
    ToBI, in its conception, was an attempt to describe intonation in terms of phonological categories. An effect of the success of ToBI in doing this has been to make it standard to try to characterise all intonational phonological distinctions in terms of ToBI distinctions, i.e. segmental alignment of pitch targets and pitch height as either High or Low. Here we report a series of experiments which attempted to do this, linking two supposed phonological categories, theme and rheme accents, to two controversial ToBI pitch accents L+H* and H* respectively. Our results suggest a reanalysis of the dimensions of phonological intonational distinctions. It is suggested that there are three layers affecting the intonational contour: global extrinsic, local extrinsic and intrinsic; and the theme-rheme distinction may lie in the local extrinsic layer. It is the similarity both of the phonetic effects and the semantic information conveyed by the last two layers that has led to the confusion in results such as those reported here

    Predicting focus through prominence structure

    Get PDF
    Focus is central to our control of information flow in dialogue. Spoken language understanding systems therefore need to be able to detect focus automatically. It is well known that prominence is a key marker of focus in English, however, the relationship is not straight-forward. We present focus prediction models built using the NXT Switchboard corpus. We claim that a focus is more likely if a word is more prominent than expected given its syntactic, semantic and discourse properties. Crucially, the perception of prominence arises not only from acoustic cues, but also the position in prosodic structure. Our focus prediction results, along with a study showing the acoustic properties of focal accents vary by structural position, support our claims. As a largely novel task, these results are an important first step in detecting focus for spoken language applications

    It's the difference that matters: An argument for contextually-grounded acoustic intonational phonology.

    Get PDF
    Standardly, the link between intonation and discourse meaning is described in terms of perceptual intonation categories, e.g. ToBI. We argue that this approach needs to be refined to explicitly recognise: firstly, that perception is affected by multiple acoustic cues, including duration and intensity, as well as F0; and secondly that the interpretation of these cues is directly linked to the phonetic and discourse context. Investigating the marking of topic status in a small game task corpus, we found that although topic status is not consistently marked by ToBI pitch accent, it is by the F0 mean, intensity and duration of the topic word. Using regression analysis, we found that when factoring out the F0 mean and intensity of key parts of the preceding discourse, intensity and duration become stronger predictors of topic status than F0

    Information structure and the prosodic structure of English : a probabilistic relationship

    Get PDF
    This work concerns how information structure is signalled prosodically in English, that is, how prosodic prominence and phrasing are used to indicate the salience and organisation of information in relation to a discourse model. It has been standardly held that information structure is primarily signalled by the distribution of pitch accents within syntax structure, as well as intonation event type. However, we argue that these claims underestimate the importance, and richness, of metrical prosodic structure and its role in signalling information structure. We advance a new theory, that information structure is a strong constraint on the mapping of words onto metrical prosodic structure. We show that focus (kontrast) aligns with nuclear prominence, while other accents are not usually directly 'meaningful'. Information units (theme/rheme) try to align with prosodic phrases. This mapping is probabilistic, so it is also influenced by lexical and syntactic effects, as well as rhythmical constraints and other features including emphasis. Rather than being directly signalled by the prosody, the likelihood of each information structure interpretation is mediated by all these properties. We demonstrate that this theory resolves problematic facts about accent distribution in earlier accounts and makes syntactic focus projection rules unnecessary. Previous theories have claimed that contrastive accents are marked by a categorically distinct accent type to other focal accents (e.g. L+H* v H*). We show this distinction in fact involves two separate semantic properties: contrastiveness and theme/rheme status. Contrastiveness is marked by increased prominence in general. Themes are distinguished from rhemes by relative prominence, i.e. the rheme kontrast aligns with nuclear prominence at the level of phrasing that includes both theme and rheme units. In a series of production and perception experiments, we directly test our theory against previous accounts, showing that the only consistent cue to the distinction between theme and rheme nuclear accents is relative pitch height. This height difference accords with our understanding of the marking of nuclear prominence: theme peaks are only lower than rheme peaks in rheme-theme order, consistent with post-nuclear lowering; in theme-rheme order, the last of equal peaks is perceived as nuclear. The rest of the thesis involves analysis of a portion of the Switchboard corpus which we have annotated with substantial new layers of semantic (kontrast) and prosodic features, which are described. This work is an essentially novel approach to testing discourse semantics theories in speech. Using multiple regression analysis, we demonstrate distributional properties of the corpus consistent with our claims. Plain and nuclear accents are best distinguished by phrasal features, showing the strong constraint of phrase structure on the perception of prominence. Nuclear accents can be reliably predicted by semantic/syntactic features, particularly kontrast, while other accents cannot. Plain accents can only be identified well by acoustic features, showing their appearance is linked to rhythmical and low-level semantic features. We further show that kontrast is not only more likely in nuclear position, but also if a word is more structurally or acoustically prominent than expected given its syntactic/information status properties. Consistent with our claim that nuclear accents are distinctive, we show that pre-, post- and nuclear accents have different acoustic profiles; and that the acoustic correlates of increased prominence vary by accent type, i.e. pre-nuclear or nuclear. Finally, we demonstrate the efficacy of our theory compared to previous accounts using examples from the corpus

    A framework for annotating information structure in discourse

    Get PDF
    We present a framework for the integrated analysis of the textual and prosodic characteristics of information structure in the Switchboard corpus of conversational English. Information structure describes the availability, organisation and salience of entities in a discourse model. We present standards for the annotation of information status (old, mediated and new), and give guidelines for annotating information structure, i.e. theme/rheme and background/kontrast. We show that information structure in English can only be analysed concurrently with prosodic prominence and phrasing. Along with existing annotations which we have integrated using NXT technology, the corpus will be unique in the field of conversational speech in terms of size and richness of annotation, vital for many NLP applications

    To memorize or to predict: Prominence labeling in conversational speech

    Get PDF
    The immense prosodic variation of natural conversational speech makes it challenging to predict which words are prosodically prominent in this genre. In this paper, we examine a new feature, accent ratio, which captures how likely it is that a word will be realized as prominent or not. We compare this feature with traditional accent prediction features (based on part of speech and N-grams) as well as with several linguistically motivated and manually labeled information structure features, such as whether a word is given, new, or contrastive. Our results show that the linguistic features do not lead to significant improvements, while accent ratio alone can yield prediction performance almost as good as the combination of any other subset of features. Moreover, this feature is useful even across genres; an accent-ratio classifier trained only on conversational speech predicts prominence with high accuracy in broadcast news. Our results suggest that carefully chosen lexicalized features can outperform less fine-grained features

    The past, present, and future of the Brain Imaging Data Structure (BIDS)

    Get PDF
    The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves. We also discuss the lessons learned through the project, with the aim of enabling researchers in other domains to learn from the success of BIDS

    The nature of theme and rheme accents

    No full text
    It has increasingly been recognised that appropriate intonation is essential to create believable voices for speech synthesis. This is particularly true in dialogue, where the link between intonation and meaning is especially important. Here we report two experiments, a production and perception study, which test an aspect of Steedman's (2000) theory relating information and intonation structure with a view to specifying intonation in a speech synthesis system. He claims that themes and rhemes, the basic building blocks of information structure, are marked by distinctive pitch accents in English, which he identifies with L+H* and H* in the ToBI system respectively. After reviewing problems with the identification of these ToBI accents, we show that speakers do produce and listeners do distinguish different pitch accents in these discourse contexts, but that the ToBI labels may not be helpful to characterise the distinction. The exact phonetic nature of theme and rheme accents remains unclear, but the alignment of the start of the rise, pitch height and the fall after the pitch peak all appear to be factors. Speakers also appear to be more sensitive to the distinction at the end of an utterance than utterance-medially
    corecore