1,056 research outputs found

    Voice interfaces in everyday life

    Get PDF
    Voice User Interfaces (VUIs) are becoming ubiquitously available, being embedded both into everyday mobility via smartphones, and into the life of the home via ‘assistant’ devices. Yet, exactly how users of such devices practically thread that use into their everyday social interactions remains underexplored. By collecting and studying audio data from month-long deployments of the Amazon Echo in participants’ homes—informed by ethnomethodology and conversation analysis—our study documents the methodical practices of VUI users, and how that use is accomplished in the complex social life of the home. Data we present shows how the device is made accountable to and embedded into conversational settings like family dinners where various simultaneous activities are being achieved. We discuss how the VUI is finely coordinated with the sequential organisation of talk. Finally, we locate implications for the accountability of VUI interaction, request and response design, and raise conceptual challenges to the notion of designing ‘conversational’ interfaces

    The Science and Art of Voice Interfaces

    Get PDF

    Speech Interfaces and Pilot Performance: A Meta-Analysis

    Get PDF
    As the aviation industry modernizes, new technology and interfaces must support growing aircraft complexity without increasing pilot workload. Natural language processing presents just such a simple and intuitive interface, yet the performance implications for use by pilots remain unknown. A meta-analysis was conducted to understand performance effects of using speech and voice interfaces in a series of pilot task analogs. The inclusion criteria selected studies that involved participants performing a demanding primary task, such as driving, while interacting with a vehicle system to enter numbers, dial radios, or enter a navigation destination. Compared to manual system interfaces, voice interfaces reduced subjective workload and inattention. Time to complete interactions with the interface appear to vary based upon the individual system, with systems that require prompts or provide confirmation taking the longest. Implications of introducing natural language systems in aviation are discussed and recommendations for future study are provided

    Emotion-aware voice interfaces based on speech signal processing

    Get PDF
    Voice interfaces (VIs) will become increasingly widespread in current daily lives as AI techniques progress. VIs can be incorporated into smart devices like smartphones, as well as integrated into autos, home automation systems, computer operating systems, and home appliances, among other things. Current speech interfaces, however, are unaware of users’ emotional states and hence cannot support real communication. To overcome these limitations, it is necessary to implement emotional awareness in future VIs. This thesis focuses on how speech signal processing (SSP) and speech emotion recognition (SER) can enable VIs to gain emotional awareness. Following an explanation of what emotion is and how neural networks are implemented, this thesis presents the results of several user studies and surveys. Emotions are complicated, and they are typically characterized using category and dimensional models. They can be expressed verbally or nonverbally. Although existing voice interfaces are unaware of users’ emotional states and cannot support natural conversations, it is possible to perceive users’ emotions by speech based on SSP in future VIs. One section of this thesis, based on SSP, investigates mental restorative effects on humans and their measures from speech signals. SSP is less intrusive and more accessible than traditional measures such as attention scales or response tests, and it can provide a reliable assessment for attention and mental restoration. SSP can be implemented into future VIs and utilized in future HCI user research. The thesis then moves on to present a novel attention neural network based on sparse correlation features. The detection accuracy of emotions in the continuous speech was demonstrated in a user study utilizing recordings from a real classroom. In this section, a promising result will be shown. In SER research, it is unknown if existing emotion detection methods detect acted emotions or the genuine emotion of the speaker. Another section of this thesis is concerned with humans’ ability to act on their emotions. In a user study, participants were instructed to imitate five fundamental emotions. The results revealed that they struggled with this task; nevertheless, certain emotions were easier to replicate than others. A further study concern is how VIs should respond to users’ emotions if SER techniques are implemented in VIs and can recognize users’ emotions. The thesis includes research on ways for dealing with the emotions of users. In a user study, users were instructed to make sad, angry, and terrified VI avatars happy and were asked if they would like to be treated the same way if the situation were reversed. According to the results, the majority of participants tended to respond to these unpleasant emotions with neutral emotion, but there is a difference among genders in emotion selection. For a human-centered design approach, it is important to understand what the users’ preferences for future VIs are. In three distinct cultures, a questionnaire-based survey on users’ attitudes and preferences for emotion-aware VIs was conducted. It was discovered that there are almost no gender differences. Cluster analysis found that there are three fundamental user types that exist in all cultures: Enthusiasts, Pragmatists, and Sceptics. As a result, future VI development should consider diverse sorts of consumers. In conclusion, future VIs systems should be designed for various sorts of users as well as be able to detect the users’ disguised or actual emotions using SER and SSP technologies. Furthermore, many other applications, such as restorative effects assessments, can be included in the VIs system

    Practical Challenges of Virtual Assistants and Voice Interfaces in Industrial Applications

    Get PDF
    Virtual assistant systems promise ubiquitous and simple access to information, applications and physical appliances. Their foundation on intent-oriented queries and support of natural language makes them an ideal tool for human-centric application. The general approach to build such systems as well as the main building blocks are well-understood and offered as off-the-shelf components. While there are prominent examples in the service sector, other sectors such as the manufacturing and process industries have nothing comparable. We investigate the practical challenges to build a virtual assistant using a representative and simplified case from the domain of knowledge retrieval. A qualitative study reveals two major obstacles: Firstly, a high level of expectations from users and, secondly, a disproportional amount of effort to get all details and having a robust system. Overall, implementing a virtual assistant for an industrial application is technical feasible, yet requires significant effort and understanding of the target audience
    corecore