72 research outputs found

    What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue

    Full text link
    The assumptions we make about a dialogue partner's knowledge and communicative ability (i.e. our partner models) can influence our language choices. Although similar processes may operate in human-machine dialogue, the role of design in shaping these models, and their subsequent effects on interaction are not clearly understood. Focusing on synthesis design, we conduct a referential communication experiment to identify the impact of accented speech on lexical choice. In particular, we focus on whether accented speech may encourage the use of lexical alternatives that are relevant to a partner's accent, and how this is may vary when in dialogue with a human or machine. We find that people are more likely to use American English terms when speaking with a US accented partner than an Irish accented partner in both human and machine conditions. This lends support to the proposal that synthesis design can influence partner perception of lexical knowledge, which in turn guide user's lexical choices. We discuss the findings with relation to the nature and dynamics of partner models in human machine dialogue.Comment: In press, accepted at 1st International Conference on Conversational User Interfaces (CUI 2019

    Algerian intonational proficiency in English: An empirical study

    Get PDF
    Rather than a thorough analysis, the present work should be regarded as a contribution to the study of intonation. More particularly, it concentrates on the intonational proficiency of a sample of Algerian speakers of English (ASE). The investigation consisted mainly of two experiments. The first one was a Production Test and aimed at gathering a speech sample of ASE, as well as a sample of native speech to be used as a control. So a test was designed and submitted to twenty ASE (ten males and ten females) and five native speakers (two males and three females). The test consisted of ten units. The first four were highly controllable and also analysed instrumentally; whereas the remaining ones were increasingly less controllable. The second experiment was a Perception Test and aimed at evaluating the data by an audience of 160 native English listeners. Here, due to the large number of utterances, only the first four units were considered. From the outset, a number of questions were raised, the most important of which are as follows. How successful do ASE manage to be in manipulating intonation so as to convey specific meanings? What are the major errors and how can they be categorized? How do ASE make Halliday's three dimensional decisions (i.e. 'tonality', 'tonicity' and 'tone')? Despite numerous errors, most Algerian utterances were correctly understood. ASE tend to divide their speech into far more intonation groups than natives do. 'Tonicity' and 'tone' errors did also occur. While attempting to answer those questions, further observations were made. The speech rhythm of the ASE tends to be syllable-timed. Rhythmic errors took place, e.g. failure to use weak forms. Short vowels tend to be lengthened. Another peculiar finding is the existence of falling and rising 'gestures' independent of nuclei. Equally peculiar is the existence of fall-level and rise- fall-level tones. Finally, the error causing the most serious communication breakdown is wrong placement of stress

    Literacies of Bilingual Youth: A Profile of Bilingual Academic, Social, and TXT Literacies

    Full text link
    This dissertation identifies three types of language skills that urban Spanish/English bilingual youth possess (academic, social, and texting language), and reports on their relationship while documenting and analyzing the features of text messaging among this population. The participants in this study are Spanish-dominant bilingual young adults enrolled in a high school completion program in New York City. They are in the process of developing both Spanish and English academic literacy skills, and it is well known that they tend to perform below the grade they are enrolled in. For this reason, they are often referred to as being “language-less” (DeCapua & Marshall, 2011; Freeman, Freeman, & Mercuri, 2002) in an academic setting. Yet, little was previously known about their linguistic skills in other language forms such as social and Txt. This research seeks to understand and document their abilities across language forms and modalities, painting a composite picture of non-traditional bilinguals students’ linguistic skills. The aims of this dissertation are achieved through three different approaches. The first is a quantitative study into participants’ literacy skills through the use of assessments measuring academic literacy and social language awareness across written, aural, and digital modalities. The second is an in-depth analysis of the features participants use when texting (communicating via SMS and iMessage). Txt is a relatively new language form, and the analysis presented in this dissertation identifies the features and patterns that illustrate its systematic and constrained nature. The third approach is a case study focused on the texting behavior between two prolific texters. The theories developed based on the texting patterns of all participants (except those two texters) are applied to this one conversation for validation. This conversation constitutes more than half of the text messages that students contributed to the project, highlighting just how important this language form is in the daily life of young adults. A final component of this dissertation is the public availability of the text messages as an anonymized corpus along with the code and methods used to analyze the data. The text message corpus is available at www.byts.commons.gc.cuny.ed

    Information structure and the prosodic structure of English : a probabilistic relationship

    Get PDF
    This work concerns how information structure is signalled prosodically in English, that is, how prosodic prominence and phrasing are used to indicate the salience and organisation of information in relation to a discourse model. It has been standardly held that information structure is primarily signalled by the distribution of pitch accents within syntax structure, as well as intonation event type. However, we argue that these claims underestimate the importance, and richness, of metrical prosodic structure and its role in signalling information structure. We advance a new theory, that information structure is a strong constraint on the mapping of words onto metrical prosodic structure. We show that focus (kontrast) aligns with nuclear prominence, while other accents are not usually directly 'meaningful'. Information units (theme/rheme) try to align with prosodic phrases. This mapping is probabilistic, so it is also influenced by lexical and syntactic effects, as well as rhythmical constraints and other features including emphasis. Rather than being directly signalled by the prosody, the likelihood of each information structure interpretation is mediated by all these properties. We demonstrate that this theory resolves problematic facts about accent distribution in earlier accounts and makes syntactic focus projection rules unnecessary. Previous theories have claimed that contrastive accents are marked by a categorically distinct accent type to other focal accents (e.g. L+H* v H*). We show this distinction in fact involves two separate semantic properties: contrastiveness and theme/rheme status. Contrastiveness is marked by increased prominence in general. Themes are distinguished from rhemes by relative prominence, i.e. the rheme kontrast aligns with nuclear prominence at the level of phrasing that includes both theme and rheme units. In a series of production and perception experiments, we directly test our theory against previous accounts, showing that the only consistent cue to the distinction between theme and rheme nuclear accents is relative pitch height. This height difference accords with our understanding of the marking of nuclear prominence: theme peaks are only lower than rheme peaks in rheme-theme order, consistent with post-nuclear lowering; in theme-rheme order, the last of equal peaks is perceived as nuclear. The rest of the thesis involves analysis of a portion of the Switchboard corpus which we have annotated with substantial new layers of semantic (kontrast) and prosodic features, which are described. This work is an essentially novel approach to testing discourse semantics theories in speech. Using multiple regression analysis, we demonstrate distributional properties of the corpus consistent with our claims. Plain and nuclear accents are best distinguished by phrasal features, showing the strong constraint of phrase structure on the perception of prominence. Nuclear accents can be reliably predicted by semantic/syntactic features, particularly kontrast, while other accents cannot. Plain accents can only be identified well by acoustic features, showing their appearance is linked to rhythmical and low-level semantic features. We further show that kontrast is not only more likely in nuclear position, but also if a word is more structurally or acoustically prominent than expected given its syntactic/information status properties. Consistent with our claim that nuclear accents are distinctive, we show that pre-, post- and nuclear accents have different acoustic profiles; and that the acoustic correlates of increased prominence vary by accent type, i.e. pre-nuclear or nuclear. Finally, we demonstrate the efficacy of our theory compared to previous accounts using examples from the corpus

    Voicing Kinship with Machines: Diffractive Empathetic Listening to Synthetic Voices in Performance.

    Get PDF
    This thesis contributes to the field of voice studies by analyzing the design and production of synthetic voices in performance. The work explores six case studies, consisting of different performative experiences of the last decade (2010- 2020) that featured synthetic voice design. It focusses on the political and social impact of synthetic voices, starting from yet challenging the concepts of voice in the machine and voice of the machine. The synthetic voices explored are often playing the role of simulated artificial intelligences, therefore this thesis expands its questions towards technology at large. The analysis of the case studies follows new materialist and posthumanist premises, yet it tries to confute the patriarchal and neoliberal approach towards technological development through feminist and de-colonial approaches, developing a taxonomy for synthetic voices in performance. Chapter 1 introduces terms and explains the taxonomy. Chapter 2 looks at familiar representations of fictional AI. Chapter 3 introduces headphone theatre exploring immersive practices. Chapters 4 and 5 engage with chatbots. Chapter 6 goes in depth exploring Human and Artificial Intelligence interaction, whereas chapter 7 moves slightly towards music production and live art. The body of the thesis includes the work of Pipeline Theatre, Rimini Protokoll, Annie Dorsen, Begüm Erciyas, and Holly Herndon. The analysis is informed by posthumanism, feminism, and performance studies, starting from my own practice as sound designer and singer, looking at aesthetics of reproduction, audience engagement, and voice composition. This thesis has been designed to inspire and provoke practitioners and scholars to explore synthetic voices further, question predominant biases of binarism and acknowledge their importance in redefining technology

    Gesture generation by imitation : from human behavior to computer character animation

    Get PDF
    This dissertation shows how to generate conversational gestures for an animated agent based on annotated text input. The central idea is to imitate the gestural behavior of human individuals. Using TV show recordings as empirical data, gestural key parameters are extracted for the generation of natural and individual gestures. For each of the three tasks in the generation pipeline a software was developed. The generic ANVIL annotation tool allows to transcribe gesture and speech in the empirical data. The NOVALIS module uses the annotations to compute individual gesture profiles with statistical methods. The NOVA generator creates gestures based on these profiles and heuristic rules, and outputs them in a linear script. In all, this work presents a complete work pipeline from collecting empirical data to obtaining an executable script and provides the necessary software, too.Die vorliegende Dissertation stellt einen Ansatz zur Generierung von Konversationsgesten für animierte Agenten aus annotatiertem Textinput vor. Zentrale Idee ist es, die Gestik menschlicher Individuen zu imitieren. Als empirisches Material dient eine Fernsehsendung, aus der Schlüsselparameter zur Generierung natürlicher und individueller Gesten extrahiert werden. Die Generierungsaufgabe wurde in drei Schritten mit eigens entwickelter Software gelöst. Das generische ANVIL-Annotationswerkzeug ermöglicht die Transkription von Gestik und Sprache in den empirischen Daten. Das NOVALIS-Modul berechnet aus den Annotationen individuelle Gestenprofile mit Hilfe statistischer Verfahren. Der NOVAGenerator erzeugt Gesten anhand dieser Profile und allgemeiner Heuristiken und gibt diese in Skriptform aus. Die Arbeit stellt somit einen vollständigen Arbeitspfad von empirischer Datenerhebung bis zum abspielfertigen Skript vor und liefert die entsprechenden Software-Werkzeuge dazu

    Gesture generation by imitation : from human behavior to computer character animation

    Get PDF
    This dissertation shows how to generate conversational gestures for an animated agent based on annotated text input. The central idea is to imitate the gestural behavior of human individuals. Using TV show recordings as empirical data, gestural key parameters are extracted for the generation of natural and individual gestures. For each of the three tasks in the generation pipeline a software was developed. The generic ANVIL annotation tool allows to transcribe gesture and speech in the empirical data. The NOVALIS module uses the annotations to compute individual gesture profiles with statistical methods. The NOVA generator creates gestures based on these profiles and heuristic rules, and outputs them in a linear script. In all, this work presents a complete work pipeline from collecting empirical data to obtaining an executable script and provides the necessary software, too.Die vorliegende Dissertation stellt einen Ansatz zur Generierung von Konversationsgesten für animierte Agenten aus annotatiertem Textinput vor. Zentrale Idee ist es, die Gestik menschlicher Individuen zu imitieren. Als empirisches Material dient eine Fernsehsendung, aus der Schlüsselparameter zur Generierung natürlicher und individueller Gesten extrahiert werden. Die Generierungsaufgabe wurde in drei Schritten mit eigens entwickelter Software gelöst. Das generische ANVIL-Annotationswerkzeug ermöglicht die Transkription von Gestik und Sprache in den empirischen Daten. Das NOVALIS-Modul berechnet aus den Annotationen individuelle Gestenprofile mit Hilfe statistischer Verfahren. Der NOVAGenerator erzeugt Gesten anhand dieser Profile und allgemeiner Heuristiken und gibt diese in Skriptform aus. Die Arbeit stellt somit einen vollständigen Arbeitspfad von empirischer Datenerhebung bis zum abspielfertigen Skript vor und liefert die entsprechenden Software-Werkzeuge dazu
    corecore