67 research outputs found

    Information structure and the prosodic structure of English : a probabilistic relationship

    Get PDF
    This work concerns how information structure is signalled prosodically in English, that is, how prosodic prominence and phrasing are used to indicate the salience and organisation of information in relation to a discourse model. It has been standardly held that information structure is primarily signalled by the distribution of pitch accents within syntax structure, as well as intonation event type. However, we argue that these claims underestimate the importance, and richness, of metrical prosodic structure and its role in signalling information structure. We advance a new theory, that information structure is a strong constraint on the mapping of words onto metrical prosodic structure. We show that focus (kontrast) aligns with nuclear prominence, while other accents are not usually directly 'meaningful'. Information units (theme/rheme) try to align with prosodic phrases. This mapping is probabilistic, so it is also influenced by lexical and syntactic effects, as well as rhythmical constraints and other features including emphasis. Rather than being directly signalled by the prosody, the likelihood of each information structure interpretation is mediated by all these properties. We demonstrate that this theory resolves problematic facts about accent distribution in earlier accounts and makes syntactic focus projection rules unnecessary. Previous theories have claimed that contrastive accents are marked by a categorically distinct accent type to other focal accents (e.g. L+H* v H*). We show this distinction in fact involves two separate semantic properties: contrastiveness and theme/rheme status. Contrastiveness is marked by increased prominence in general. Themes are distinguished from rhemes by relative prominence, i.e. the rheme kontrast aligns with nuclear prominence at the level of phrasing that includes both theme and rheme units. In a series of production and perception experiments, we directly test our theory against previous accounts, showing that the only consistent cue to the distinction between theme and rheme nuclear accents is relative pitch height. This height difference accords with our understanding of the marking of nuclear prominence: theme peaks are only lower than rheme peaks in rheme-theme order, consistent with post-nuclear lowering; in theme-rheme order, the last of equal peaks is perceived as nuclear. The rest of the thesis involves analysis of a portion of the Switchboard corpus which we have annotated with substantial new layers of semantic (kontrast) and prosodic features, which are described. This work is an essentially novel approach to testing discourse semantics theories in speech. Using multiple regression analysis, we demonstrate distributional properties of the corpus consistent with our claims. Plain and nuclear accents are best distinguished by phrasal features, showing the strong constraint of phrase structure on the perception of prominence. Nuclear accents can be reliably predicted by semantic/syntactic features, particularly kontrast, while other accents cannot. Plain accents can only be identified well by acoustic features, showing their appearance is linked to rhythmical and low-level semantic features. We further show that kontrast is not only more likely in nuclear position, but also if a word is more structurally or acoustically prominent than expected given its syntactic/information status properties. Consistent with our claim that nuclear accents are distinctive, we show that pre-, post- and nuclear accents have different acoustic profiles; and that the acoustic correlates of increased prominence vary by accent type, i.e. pre-nuclear or nuclear. Finally, we demonstrate the efficacy of our theory compared to previous accounts using examples from the corpus

    A Phonetic Account of the Terminology, Form, and Grammatical Classification of "Filled Pauses"

    Get PDF
    The article processing charge was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 491192747 and the Open Access Publication Fund of Humboldt-Universität zu Berlin.The terms hesitation, planner, filler, and filled pause do not always refer to the same phonetic entities. This terminological conundrum is approached by investigating the observational, explanatory, and descriptive inadequacies of the terms in use. Concomitantly, the term filler particle is motivated and a definition is proposed that identifies its phonetic exponents and describes them within the linguistic category of particles. The definition of filler particles proposed here is grounded both theoretically and empirically and then applied to a corpus of spontaneous dialogues with 32 speakers of German, showing that in addition to the prototypical phonetic forms, there is a substantial amount of non-prototypical forms, i.e., 9.5%, comprising both glottal (e.g., [Ɂ]) and vocal forms (e.g., [ɛɸ], [j˜ɛvə]). The grammatical classification and the results regarding the phonetic forms are discussed with respect to their theoretical relevance in filler particle research and corpus studies. The phonetic approach taken here further suggests a continuum of phonetic forms of filler particles, ranging from singleton segments to multi-syllabic entities.Peer Reviewe

    Proceedings, MSVSCC 2018

    Get PDF
    Proceedings of the 12th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2018 at VMASC in Suffolk, Virginia. 155 pp

    Automatic detection of disfluencies in a corpus of university lectures

    Get PDF
    This dissertation focuses on the identification of disfluent sequences and their distinct structural regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32 hours of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. The best results concern the detection of the interregnum, followed by the detection of the interruption point. Several machine learning methods have been applied, but experiments show that Classification and Regression Trees usually outperform the other methods. The set of most informative features for cross-region identification encompasses word duration ratios, word confidence score, silent ratios, and pitch and energy slopes. Features such as the number of phones and syllables per word proved to be more useful for the identification of the interregnum, whereas energy slopes were most suited for identifying the interruption point. We have also conducted initial experiments on automatic detecting filled pauses, the most frequent disfluency type. For now, only force aligned transcripts were used, since the ASR system is not well adapted to this domain. This study is a step towards automatic detection of filled pauses for European Portuguese using prosodic features. Future work will extend this study for fully automatic transcripts, and will also tackle other domains, also exploring extended sets of linguistic features.Esta tese aborda a identificação de sequências disfluentes e respetivas regiões estruturais. As experiências aqui descritas baseiam-se em segmentação e informação relativa a prosódia, calculadas a partir de um corpus de aulas universitárias em Português Europeu, contendo cerca de 32 horas de fala e de cerca de 7,7% de disfluências. O conjunto de características utilizadas provou ser discriminatório na identificação das regiões contidas na produção de disfluências. Os melhores resultados dizem respeito à deteção do interregnum, seguida da deteção do ponto de interrupção. Foram testados vários métodos de aprendizagem automática, sendo as Árvores de Decisão e Regressão as que geralmente obtiveram os melhores resultados. O conjunto de características mais informativas para a identificação e distinção de regiões disfluentes abrange rácios de duração de palavras, nível de confiança da palavra atual, rácios envolvendo silêncios e declives de pitch e de energia. Características tais como o número de fones e sílabas por palavra provaram ser mais úteis para a identificação do interregnum, enquanto pitch e energia foram os mais adequados para identificar o ponto de interrupção. Foram também realizadas experiências focando a deteção de pausas preenchidas. Por enquanto, para estas experiências foi utilizado apenas material proveniente de alinhamento forçado, já que o sistema de reconhecimento automático não está bem adaptado a este domínio. Este estudo representa um novo passo no sentido da deteção automática de pausas preenchidas para Português Europeu, utilizando recursos prosódicos. Em trabalho futuro pretende-se estender esse estudo para transcrições automáticas e também abordar outros domínios, explorando conjuntos mais extensos de características linguísticas

    The Phonetic Realization of Narrow Focus in English L1 and L2. Data from Production and Perception

    Get PDF
    The typological differences between the two languages are reflected in the strategies adopted to mark sentence-level prominence. While English mark focus by modulating prosodic parameters (namely, pitch, duration and intensity), Italian normally recurs to word order strategies, benefitting from the freer word order admitted by its syntax. This study is aimed to investigate the acquisition of the prosodic marking of narrow non-contrastive focus by Italian speakers of English L2. This study was mainly aimed at: (a) determining and comparing the prosodic cues used by English native speakers and Italian speakers of English L2 when marking narrow focus; (b) verifying if the Italian speakers are able to acquire the English prosodic strategies in focus marking as a function of their competence in English, progressively avoiding the focus marking strategies that characterize their L1 in favor of more native-like solutions; (c) investigating the phenomenon not only at the production level, but also from the point of view of perception. Consequently, this work is composed by a production and a perception study. The production study consisted in the acoustic analysis of native and non-native productions. The speech data were collected using a semi-spontaneous method, where speakers recorded a set of short sentences as replies to wh- questions, with the aim of eliciting sentences presenting narrow focus on subject or on verb. Three groups of speakers were recorded: English native speakers NS), Italian native speakers with a higher competence in English L2 (NNS1), and Italian native speakers with a lower competence in English L2 (NNS2). A similar set of Italian L1 sentences was also elicited from the Italian speakers. The acoustical analysis was performed at sentence and word level, and it was mainly based on the measurement of fundamental frequency and duration. The results confirmed that English native speakers mark narrow focus mainly by modulating pitch. NNS1 showed a progress towards the target model, by implementing an active use of pitch, although not perfectly matching with the native one. Finally, NNS2 were not able to mark focus with the use of prosodic parameters. The analysis of the Italian L1 data set suggested that in Italian narrow non-contrastive focus is not marked prosodically. Not even duration, which in Italian is the prosodic cue normally used to mark prominence at word level seems to play a role in signaling prominence at sentence level. The perception study was designed to verify whether the differences shown by the acoustical measurements could also have an impact on the listeners' perception. Two perception tests were designed, based on a two-alternative forced-choice paradigm, where listeners were asked to identify narrow focus by guessing the wh- question that had triggered each sentence. Experiment 1 presented natural sentences to two groups of listeners: 22 British native speakers and 22 Italian native listeners. The Italian native listeners were also presented with an extra set of stimuli, consisting of the Italian L1 data set. The results of Experiment 1 showed that English native listeners could correctly identify narrow focus even without extra contextual information. This happened for NS and NNS1, whereas the listeners could not recognize focus in the productions by NNS2. The Italian listeners could also detect focus well above chance level in the productions by NS. However, they failed to identify focus in the productions by NNS1 and NNS2. As for the Italian L1 data set, the Italian listeners failed to distinguish narrow focus, providing perceptual evidence to the hypothesis that Italians do not mark narrow focus by prosody. Experiment 2 was designed to investigate the effect of the differences in pitch modulation on the correct detection of narrow focus by English native listeners. In this case, the productions of the speakers were acoustically manipulated. The participants were 20 British English native speakers. In general, the results of Experiment 2 confirmed that pitch plays an important role in the recognition of narrow focus also from the perceptual point of view. This is particularly true for NS productions, while the listeners could not successfully identify focus in the modified non-native productions. The results of the production study and the perception study converged in showing that in English pitch plays an important role in the production and perception of narrow non-contrastive focus. As for non-native productions, NNS1 could approach the native model to a certain extent by modulating "FO". From the perceptual point of view, their productions were effective enough to be successfully understood by English native listeners. In contrast, NNS2 had not managed to adopt the strategies of English, showing a poor prosodic characterization of the constituent in focus. As a consequence, the listeners could not identify focus in the NNS2 productions. These findings are particularly interesting not only for research in L2 phonetics, but also for their implications for language instruction, where prosody has only recently started to be studied and taught with renewed interest and momentum

    Augmented Reality

    Get PDF
    Augmented Reality (AR) is a natural development from virtual reality (VR), which was developed several decades earlier. AR complements VR in many ways. Due to the advantages of the user being able to see both the real and virtual objects simultaneously, AR is far more intuitive, but it's not completely detached from human factors and other restrictions. AR doesn't consume as much time and effort in the applications because it's not required to construct the entire virtual scene and the environment. In this book, several new and emerging application areas of AR are presented and divided into three sections. The first section contains applications in outdoor and mobile AR, such as construction, restoration, security and surveillance. The second section deals with AR in medical, biological, and human bodies. The third and final section contains a number of new and useful applications in daily living and learning

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

    Human factors in instructional augmented reality for intravehicular spaceflight activities and How gravity influences the setup of interfaces operated by direct object selection

    Get PDF
    In human spaceflight, advanced user interfaces are becoming an interesting mean to facilitate human-machine interaction, enhancing and guaranteeing the sequences of intravehicular space operations. The efforts made to ease such operations have shown strong interests in novel human-computer interaction like Augmented Reality (AR). The work presented in this thesis is directed towards a user-driven design for AR-assisted space operations, iteratively solving issues arisen from the problem space, which also includes the consideration of the effect of altered gravity on handling such interfaces.Auch in der bemannten Raumfahrt steigt das Interesse an neuartigen Benutzerschnittstellen, um nicht nur die Mensch-Maschine-Interaktion effektiver zu gestalten, sondern auch um einen korrekten Arbeitsablauf sicherzustellen. In der Vergangenheit wurden wiederholt Anstrengungen unternommen, Innenbordarbeiten mit Hilfe von Augmented Reality (AR) zu erleichtern. Diese Arbeit konzentriert sich auf einen nutzerorientierten AR-Ansatz, welcher zum Ziel hat, die Probleme schrittweise in einem iterativen Designprozess zu lösen. Dies erfordert auch die Berücksichtigung veränderter Schwerkraftbedingungen
    corecore