13,480 research outputs found
It's Good to Talk: A Comparison of Using Voice Versus Screen-Based Interactions for Agent-Assisted Tasks
Voice assistants have become hugely popular in the home as domestic and entertainment devices. Recently, there has been a move towards developing them for work settings. For example, Alexa for Business and IBM Watson for Business were designed to improve productivity, by assisting with various tasks, such as scheduling meetings and taking minutes. However, this kind of assistance is largely limited to planning and managing user's work. How might they be developed to do more by way of empowering people at work? Our research is concerned with achieving this by developing an agent with the role of a facilitator that assists users during an ongoing task. Specifically, we were interested in whether the modality in which the agent interacts with users makes a difference: How does a voice versus screen-based agent interaction affect user behavior? We hypothesized that voice would be more immediate and emotive, resulting in more fluid conversations and interactions. Here, we describe a user study that compared the benefits of using voice versus screen-based interactions when interacting with a system incorporating an agent, involving pairs of participants doing an exploratory data analysis task that required them to make sense of a series of data visualizations. The findings from the study show marked differences between the two conditions, with voice resulting in more turn-taking in discussions, questions asked, more interactions with the system and a tendency towards more immediate, faster-paced discussions following agent prompts. We discuss the possible reasons for why talking and being prompted by a voice assistant may be preferable and more effective at mediating human-human conversations and we translate some of the key insights of this research into design implications
Visualizations for an Explainable Planning Agent
In this paper, we report on the visualization capabilities of an Explainable
AI Planning (XAIP) agent that can support human in the loop decision making.
Imposing transparency and explainability requirements on such agents is
especially important in order to establish trust and common ground with the
end-to-end automated planning system. Visualizing the agent's internal
decision-making processes is a crucial step towards achieving this. This may
include externalizing the "brain" of the agent -- starting from its sensory
inputs, to progressively higher order decisions made by it in order to drive
its planning components. We also show how the planner can bootstrap on the
latest techniques in explainable planning to cast plan visualization as a plan
explanation problem, and thus provide concise model-based visualization of its
plans. We demonstrate these functionalities in the context of the automated
planning components of a smart assistant in an instrumented meeting space.Comment: PREVIOUSLY Mr. Jones -- Towards a Proactive Smart Room Orchestrator
(appeared in AAAI 2017 Fall Symposium on Human-Agent Groups
Customization of IBM Intuâs Voice by Connecting Text-to-Speech Services and a Voice Conversion Network
IBM has recently launched Project Intu, which extends the existing web-based cognitive service Watson with the Internet of Things to provide an intelligent personal assistant service. We propose a voice customization service that allows a user to directly customize the voice of Intu. The method for voice customization is based on IBM Watsonâs text-to-speech service and voice conversion model. A user can train the voice conversion model by providing a minimum of approximately 100 speech samples in the preferred voice (target voice). The output voice of Intu (source voice) is then converted into the target voice. Furthermore, the user does not need to offer parallel data for the target voice since the transcriptions of the source speech and target speech are the same. We also suggest methods to maximize the efficiency of voice conversion and determine the proper amount of target speech based on several experiments. When we measured the elapsed time for each process, we observed that feature extraction accounts for 59.7% of voice conversion time, which implies that fixing inefficiencies in feature extraction should be prioritized. We used the mel-cepstral distortion between the target speech and reconstructed speech as an index for conversion accuracy and found that, when the number of target speech samples for training is less than 100, the general performance of the model degrades
- âŠ