55 research outputs found

    An End-to-End Conversational Style Matching Agent

    Full text link
    We present an end-to-end voice-based conversational agent that is able to engage in naturalistic multi-turn dialogue and align with the interlocutor's conversational style. The system uses a series of deep neural network components for speech recognition, dialogue generation, prosodic analysis and speech synthesis to generate language and prosodic expression with qualities that match those of the user. We conducted a user study (N=30) in which participants talked with the agent for 15 to 20 minutes, resulting in over 8 hours of natural interaction data. Users with high consideration conversational styles reported the agent to be more trustworthy when it matched their conversational style. Whereas, users with high involvement conversational styles were indifferent. Finally, we provide design guidelines for multi-turn dialogue interactions using conversational style adaptation

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies

    Designing talk in social networks: What Facebook teaches about conversation

    Get PDF
    The easy accessibility, ubiquity, and plurilingualism of popular SNSs such as Facebook have inspired many scholars and practitioners of second language teaching and learning to integrate networked forms of communication into educational contexts such as language classrooms and study abroad programs (e.g., Blattner & Fiori, 2011; Lamy & Zourou, 2013; Mills, 2011; Reinhardt & Ryu, 2013; Reinhardt & Zander, 2011). At the same time, the complex and dynamic patterns of interaction that emerge in these spaces quickly push back upon standard ways of describing conversational genres and communicative competence (Kern, 2014; Lotherington & Ronda, 2014). Drawing from an ecological interactional analysis (Goffman, 1964, 1981a, 1981b, 1986; Kramsch & Whiteside, 2008) of the Facebook communications of three German-speaking academics whose social and professional lives are largely led in English, the authors consider the kinds of symbolic maneuvers required to participate in the translingual conversational flows of SNS-mediated communication. Based on this analysis, this article argues that texts generated through SNS-mediated communication can provide classroom opportunities for critical, stylistically sensitive reflection on the nature of talk in line with multiliteracies approaches

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies

    Mining Behavior of Citizen Sensor Communities to Improve Cooperation with Organizational Actors

    Get PDF
    Web 2.0 (social media) provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens generate content for sharing information and engaging in discussions. Such a citizen sensor community (CSC) has stated or implied goals that are helpful in the work of formal organizations, such as an emergency management unit, for prioritizing their response needs. This research addresses questions related to design of a cooperative system of organizations and citizens in CSC. Prior research by social scientists in a limited offline and online environment has provided a foundation for research on cooperative behavior challenges, including \u27articulation\u27 and \u27awareness\u27, but Web 2.0 supported CSC offers new challenges as well as opportunities. A CSC presents information overload for the organizational actors, especially in finding reliable information providers (for awareness), and finding actionable information from the data generated by citizens (for articulation). Also, we note three data level challenges: ambiguity in interpreting unconstrained natural language text, sparsity of user behaviors, and diversity of user demographics. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues. I present a novel web information-processing framework, called the Identify-Match- Engage (IME) framework. IME allows operationalizing computation in design problems of awareness and articulation of the cooperative system between citizens and organizations, by addressing data problems of group engagement modeling and intent mining. The IME framework includes: a.) Identification of cooperation-assistive intent (seeking-offering) from short, unstructured messages using a classification model with declarative, social and contrast pattern knowledge, b.) Facilitation of coordination modeling using bipartite matching of complementary intent (seeking-offering), and c.) Identification of user groups to prioritize for engagement by defining a content-driven measure of \u27group discussion divergence\u27. The use of prior knowledge and interplay of features of users, content, and network structures efficiently captures context for computing cooperation-assistive behavior (intent and engagement) from unstructured social data in the online socio-technical systems. Our evaluation of a use-case of the crisis response domain shows improvement in performance for both intent classification and group engagement prioritization. Real world applications of this work include use of the engagement interface tool during various recent crises including the 2014 Jammu and Kashmir floods, and intent classification as a service integrated by the crisis mapping pioneer Ushahidi\u27s CrisisNET project for broader impact

    Don’t interrupt me while I’m speaking: Interruption in Everyday and Institutional Settings in Chinese

    Get PDF
    Interruption is a common phenomenon in conversation. Previous research of interruption has focused on three main aspects: the identification of interruption in relation to overlaps or overlapping speech, the categorisation of cooperative and disruptive interruptions, and the relationship between interruption and certain social factors, for instance, power asymmetry and gender differences. However, little attention has been paid to the degree of intrusiveness. Likewise, not much has been done to explore interactional factors that may intersect with interruptions. With these important research gaps in mind, I aim to explore the relationship between intrusiveness and interactional dimensions of interruptions in the Chinese context in this study. Two sets of conversational data were collected: telephone conversations and TV talk show conversations. The conversation analytic method was used to examine the fine-grained details of speakers’ conversational interaction (Haugh, 2012). Statistical methods were used to test the relationship between factors related to interruptions. Results from a linear regression model indicate that, in both settings, speakers tend to heed and boost the current information flow (e.g., supplementing further details) when expressing affiliative stances. More specifically, in the institutional conversation, speakers orient their interruption utterances towards the their assigned institutional role and task (Goffman, 1981; Heritage & Greatbatch, 1991). In the telephone conversation, there are frequent early interruptions, affiliative interruptions, and unexpected cases where interrupters align their opinions with the other whilst disrupting the current information flow. Based on what emerged from these analyses, I argue that the Chinese speakers in the two corpora feature a high involvement (Tannen, 2005) conversational style, which means they prioritise relationship over the task in discussion. In other words, speakers tend to distinctively emphasise their enthusiasm and engagement with the other speaker, but pay less attention to the one-speaker-at-a-time turn-taking rule (Sacks, Schegloff, & Jefferson, 1974). The finding of relationship-focus of Chinese talk-in-interaction supports the argument that Chinese society largely adheres to the polychronic time orientation (Hall, 1984). This study contributes to CA methodology by combining rigorous quantification methods with close examination of sequential organisation of interruptions. It is innovative in measuring intrusiveness by incorporating two aspects of interruptions: the interrupter’s stance-taking and the interrupter’s sequential alignment with the information flow of the prior utterance. In so doing, this study contributes to the understanding of interruption by demonstrating that intrusiveness is a gradient concept on a measurable continuum rather than a binary concept that is either cooperative or intrusive. This study contributes to the investigation into Chinese talk-in-interaction, particularly speakers’ conversational style, by proposing a novel perspective: interruption. Keywords: Interruption, intrusiveness, affiliation, information flow, interruption marker, interruption timing, Chinese talk-in-interactio

    My watch begins : identification and procedural rhetoric on second screens and social networks.

    Get PDF
    Digital rhetoric creates opportunities for examining rhetoric as it evolves daily. This evolution may be described in terms of network circulation and immediate opportunities for publishing and creating. This project analyzes mobile applications and live feeds used during television broadcasts, where rhetoric is closely tied to the work of identifying with another point of view. Producers and designers of dual-screen applications prompt us to answer how we would act if we assumed the role of protagonist and saw the world through her or his eyes. These questions support the idea that identification is not just a relative of empathy or a way to engage emotionally with the text but also a way to approach problems and sharpen observation. From this dissertation’s findings we may reconsider the work of seeing and perspectival shifting as part of a sophisticated procedure of reflexive role play and public intellectualism. In addition, the analysis provides information about how mobile devices and second screens work to support consensus and a preferred reading (viewing) of popular narratives and group performances, thereby calling into careful consideration how we use such devices to influence others. Finally, the dissertation’s work helps us understand new forms of viral communication and the velocity (Ridolfo and DeVoss) at which they are transmitted. Consequently, we may approach textual artifacts as “living documents” and consider how such “living” properties may change our perceptions of authorship and composing. In Chapter One, “My Watch Begins: Complex Narrative, Transmedia, and Point of View,” I begin by offering an overview of my methodological approach to these applications. I situate the work of identification on mobile devices within the larger conversation surrounding transmedia and how it encourages viewers to participate in contemporary television narratives. This section provides explanations of how the terms procedural rhetoric (as introduced first by Ian Bogost), prosopopoiea (from ancient rhetoric), and point of view (from narrative theory) will function in this project, with most of the attention given to procedural and rhetorical studies of the various programs and websites associated with audience writings. This chapter also calls attention to the difference between empathy and perspective shifting. An example from contemporary culture that helps illustrate this difference and provides space for conversation is the viral blog post “I Am Adam Lanza’s Mother.” This editorial, written in the aftermath of the Sandy Hook shooting in 2012, features identification techniques used as persuasive tools but does so in a problematic way that might be better handled with a nuanced and careful study of how identification operates in other settings. Central to this project are questions addressing how we discuss and document the acts of viewing/seeing/looking, and in what ways the process of seeing from multiple perspectives is currently being lauded in society and the academy. In Chapter Two, “If You See Something, Say Something: Syncing Audience Viewing and Response,” I reveal two opening examples that illustrate these premises: one from a Walking Dead advertisement that features the protagonist’s eye and one from a Department of Homeland Security ad-“If You See Something, Say Something.” These examples dovetail into a specific analysis of syncing devices, or dual screen viewing experiences, and the actual rhetoric accompanying the requests to see from multiple perspectives (“If you were Rick, you would ___”). I also call attention to shows where the act of identifying with the protagonist raises questions about the limitations of perspectives. To be specific, I suggest that the white middle-class male is the paradigm of identification exercises for shows that encourage participation from viewers. Examples from television suggest that women and minorities are less likely to be the characters with whom we align our interests; therefore, I argue we should interrogate this trend and think reflexively about the act of identifying. In Chapter Three, “Choreographing Conversation through Tagging, Tokens, and Reblogs,” I argue that analysis of audience reactions via live feeds and blogging platforms shows that textual artifacts, through increased circulation, promote a certain form of identification through consensus. This consensus reveals the tendency of viewers to gravitate toward preferred readings (viewings) of narratives and to identify with characters closely resembling themselves. By constituting viewers in a rhetoric specific to each fictional world, producers encourage identification and help secure appropriate and largely positive viewer behaviors through conversations online. Specifically, digital activities like “checking in” to a show and writing with specific hashtags become markers of narrative involvement. Producers, in turn, engage in reciprocal action by promoting or displaying fan activity on their own feeds, thereby sponsoring the work of the audience. While such activity often leads to conformity, I argue that these moments of group consensus may act as springboards for future conversation about other perspectives and narrative outcomes. In Chapter Four, “Texts as Bodies, Bodies as Texts: Tumblr Role Play and the Rhetorical Practices of Identification,” the rhetorical analysis of these online sites and mobile applications then leads to questions of how we perceive embodiment during identification. In this section I look closely at the writing found on the microblogging site Tumblr, where viewers of television narrative engage in role playing their favorite protagonists and creating dialogue with fellow role players. This practice, operating outside the jurisdiction of producer-designed apps, reveals new patterns of the work of identification. With attention to the ideas of Katherine Hayles and Deleuze and Guattari, we may reconsider how text, once circulated, acts as an extension of and a replacement for the physical body. Still, the work of these bloggers demonstrates that identification is still a personal investment that refers to and gives credit to the person behind the computer screen. This chapter reveals a productive tension between the embodied author’s work and the nature of writing as it moves through networks. In my conclusion I explain how these applications and online tools have implications for the writing classroom. Students are frequently told that good writers and thinkers must see a problem or an issue from multiple perspectives. This project focuses intensely on the work of shifting perspectives and how those perspectives are represented in writing. Its implications for teaching productive source integration and research may be applied to the first-year writing classroom but also the graduate class curriculum, where novice scholars learn to extend, oppose, and ally themselves with the scholars who have come before them
    • 

    corecore