44 research outputs found
Phonetic Dimensions of Intonational Categories - the case of L+H* and H*
ToBI, in its conception, was an attempt to describe intonation in terms of phonological categories. An effect of the success of ToBI in doing this has been to make it standard to try to characterise all intonational phonological distinctions in terms of ToBI distinctions, i.e. segmental alignment of pitch targets and pitch height as either High or Low. Here we report a series of experiments which attempted to do this, linking two supposed phonological categories, theme and rheme accents, to two controversial ToBI pitch accents L+H* and H* respectively. Our results suggest a reanalysis of the dimensions of phonological intonational distinctions. It is suggested that there are three layers affecting the intonational contour: global extrinsic, local extrinsic and intrinsic; and the theme-rheme distinction may lie in the local extrinsic layer. It is the similarity both of the phonetic effects and the semantic information conveyed by the last two layers that has led to the confusion in results such as those
reported here
Predicting focus through prominence structure
Focus is central to our control of information flow in dialogue.
Spoken language understanding systems therefore need to be
able to detect focus automatically. It is well known that prominence
is a key marker of focus in English, however, the relationship
is not straight-forward. We present focus prediction models
built using the NXT Switchboard corpus. We claim that a focus
is more likely if a word is more prominent than expected given
its syntactic, semantic and discourse properties. Crucially, the
perception of prominence arises not only from acoustic cues,
but also the position in prosodic structure. Our focus prediction
results, along with a study showing the acoustic properties
of focal accents vary by structural position, support our claims.
As a largely novel task, these results are an important first step
in detecting focus for spoken language applications
Information structure and the prosodic structure of English : a probabilistic relationship
This work concerns how information structure is signalled prosodically in English, that is, how prosodic prominence and phrasing are used to indicate the salience and organisation of information in relation to a discourse model. It has been standardly held that information structure is primarily signalled by the distribution of pitch accents within syntax structure, as well as intonation event type. However, we argue that these claims underestimate the importance, and richness, of metrical prosodic structure and its role in signalling information structure.
We advance a new theory, that information structure is a strong constraint on the mapping of words onto metrical prosodic structure. We show that focus (kontrast) aligns with nuclear prominence, while other accents are not usually directly 'meaningful'. Information units (theme/rheme) try to align with prosodic phrases. This mapping is probabilistic, so it is
also influenced by lexical and syntactic effects, as well as rhythmical constraints and other features including emphasis. Rather than being directly signalled by the prosody, the likelihood of each information structure interpretation is mediated by all these properties. We demonstrate that this theory resolves problematic facts about accent distribution in earlier accounts and makes syntactic focus projection rules unnecessary.
Previous theories have claimed that contrastive accents are marked by a categorically distinct accent type to other focal accents (e.g. L+H* v H*). We show this distinction in fact involves two separate semantic properties: contrastiveness and theme/rheme status. Contrastiveness is marked by increased prominence in general. Themes are distinguished from rhemes by relative prominence, i.e. the rheme kontrast aligns with nuclear prominence at the level of phrasing that includes both theme and rheme units. In a series of production and perception experiments, we directly test our theory against previous accounts, showing that the only consistent cue to the distinction between theme and rheme nuclear accents is relative pitch height. This height difference accords with our understanding of the marking of nuclear prominence: theme peaks are only lower than rheme peaks in rheme-theme order, consistent with post-nuclear lowering; in theme-rheme order, the last of equal peaks is perceived as nuclear.
The rest of the thesis involves analysis of a portion of the Switchboard corpus which we have annotated with substantial new layers of semantic (kontrast) and prosodic features, which are described. This work is an essentially novel approach to testing discourse semantics theories in speech. Using multiple regression analysis, we demonstrate distributional
properties of the corpus consistent with our claims. Plain and nuclear accents are best distinguished by phrasal features, showing the strong constraint of phrase structure on the perception of prominence. Nuclear accents can be reliably predicted by semantic/syntactic features, particularly kontrast, while other accents cannot. Plain accents can only be identified well by acoustic features, showing their appearance is linked to rhythmical and low-level semantic features. We further show that kontrast is not only more likely in nuclear position, but also if a word is more structurally or acoustically prominent than expected given its syntactic/information status properties. Consistent with our claim that nuclear accents are
distinctive, we show that pre-, post- and nuclear accents have different acoustic profiles; and that the acoustic correlates of increased prominence vary by accent type, i.e. pre-nuclear or nuclear. Finally, we demonstrate the efficacy of our theory compared to previous accounts using examples from the corpus
It's the difference that matters: An argument for contextually-grounded acoustic intonational phonology.
Standardly, the link between intonation and discourse meaning is described in terms of perceptual intonation categories, e.g. ToBI. We argue that this approach needs to be refined to explicitly recognise: firstly, that perception is affected by multiple acoustic cues, including duration and intensity, as well as F0; and secondly that the interpretation of these cues is directly linked to the phonetic and discourse context. Investigating the marking of topic status in a small game task corpus, we found that although topic status is not consistently marked by ToBI pitch accent, it is by the F0 mean, intensity and duration of the topic word. Using regression analysis, we found that when factoring out the F0 mean and intensity of key parts of the preceding discourse, intensity and duration become stronger predictors of topic status than F0
A framework for annotating information structure in discourse
We present a framework for the integrated analysis of the textual and prosodic characteristics of information structure in the Switchboard corpus of conversational English. Information structure describes the availability, organisation and salience of entities in a discourse model. We present standards for the annotation of information status (old, mediated and new), and give guidelines for annotating information structure, i.e. theme/rheme and background/kontrast. We show that information structure in English can only be analysed concurrently with prosodic prominence and phrasing. Along with existing annotations which we have integrated using NXT technology, the corpus will be unique in the field of conversational speech in terms of size and richness of annotation, vital for many NLP applications
To memorize or to predict: Prominence labeling in conversational speech
The immense prosodic variation of natural conversational speech makes it challenging to predict which words are prosodically prominent in this genre. In this paper, we examine a new feature, accent ratio, which captures how likely it is that a word will be realized as prominent or not. We compare this feature with traditional accent prediction features (based on part of speech and N-grams) as well as with several linguistically motivated and manually labeled information structure features, such as whether a word is given, new, or contrastive. Our results show that the linguistic features do not lead to significant improvements, while accent ratio alone can yield prediction performance almost as good as the combination of any other subset of features. Moreover, this feature is useful even across genres; an accent-ratio classifier trained only on conversational speech predicts prominence with high accuracy in broadcast news. Our results suggest that carefully chosen lexicalized features can outperform less fine-grained features
The past, present, and future of the brain imaging data structure (BIDS)
The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves. We also discuss the lessons learned through the project, with the aim of enabling researchers in other domains to learn from the success of BIDS
The Past, Present, and Future of the Brain Imaging Data Structure (BIDS)
The Brain Imaging Data Structure (BIDS) is a community-driven standard for
the organization of data and metadata from a growing range of neuroscience
modalities. This paper is meant as a history of how the standard has developed
and grown over time. We outline the principles behind the project, the
mechanisms by which it has been extended, and some of the challenges being
addressed as it evolves. We also discuss the lessons learned through the
project, with the aim of enabling researchers in other domains to learn from
the success of BIDS.Development of the BIDS Standard has been supported by the International Neuroinformatics Coordinating Facility, Laura and John Arnold Foundation, National Institutes of Health (R24MH114705, R24MH117179, R01MH126699, R24MH117295, P41EB019936, ZIAMH002977, R01MH109682, RF1MH126700, R01EB020740), National Science Foundation (OAC-1760950, BCS-1734853, CRCNS-1429999, CRCNS-1912266), Novo Nordisk Fonden (NNF20OC0063277), French National Research Agency (ANR-19-DATA-0023, ANR 19-DATA-0021), Digital Europe TEF-Health (101100700), EU H2020 Virtual Brain Cloud (826421), Human Brain Project (SGA2 785907, SGA3 945539), European Research Council (Consolidator 683049), German Research Foundation (SFB 1436/425899996), SFB 1315/327654276, SFB 936/178316478, SFB-TRR 295/424778381), SPP Computational Connectomics (RI 2073/6-1, RI 2073/10-2, RI 2073/9-1), European Innovation Council PHRASE Horizon (101058240), Berlin Institute of Health & Foundation CharitĂ©, Johanna Quandt Excellence Initiative, ERAPerMed Pattern-Cog, and the Virtual Research Environment at the CharitĂ© Berlin â a node of EBRAINS Health Data Cloud.N