18 research outputs found
A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents
The burgeoning fields of robot learning and embodied AI have triggered an
increasing demand for large quantities of data. However, collecting sufficient
unbiased data from the target domain remains a challenge due to costly data
collection processes and stringent safety requirements. Consequently,
researchers often resort to data from easily accessible source domains, such as
simulation and laboratory environments, for cost-effective data acquisition and
rapid model iteration. Nevertheless, the environments and embodiments of these
source domains can be quite different from their target domain counterparts,
underscoring the need for effective cross-domain policy transfer approaches. In
this paper, we conduct a systematic review of existing cross-domain policy
transfer methods. Through a nuanced categorization of domain gaps, we
encapsulate the overarching insights and design considerations of each problem
setting. We also provide a high-level discussion about the key methodologies
used in cross-domain policy transfer problems. Lastly, we summarize the open
challenges that lie beyond the capabilities of current paradigms and discuss
potential future directions in this field
Reinforcement learning with supervision beyond environmental rewards
Reinforcement Learning (RL) is an elegant approach to tackle sequential decision-making problems. In the standard setting, the task designer curates a reward function and the RL agent's objective is to take actions in the environment such that the long-term cumulative reward is maximized. Deep RL algorithms---that combine RL principles with deep neural networks---have been successfully used to learn behaviors in complex environments but are generally quite sensitive to the nature of the reward function. For a given RL problem, the environmental rewards could be sparse, delayed, misspecified, or unavailable (i.e., impossible to define mathematically for the required behavior). These scenarios exacerbate the challenge of training a stable deep-RL agent in a sample-efficient manner.
In this thesis, we study methods that go beyond a direct reliance on the environmental rewards by generating additional information signals that the RL agent could incorporate for learning the desired skills. We start by investigating the performance bottlenecks in delayed reward environments and propose to address these by learning surrogate rewards. We include two methods to compute the surrogate rewards using the agent-environment interaction data. Then, we consider the imitation-learning (IL) setting where we don't have access to any rewards, but instead, are provided with a dataset of expert demonstrations that the RL agent must learn to reliably reproduce. We propose IL algorithms for partially observable environments and situations with discrepancies between the transition dynamics of the expert and the imitator. Next, we consider the benefits of learning an ensemble of RL agents with explicit diversity pressure. We show that diversity encourages exploration and facilitates the discovery of sparse environmental rewards. Finally, we analyze the concept of sharing knowledge between RL agents operating in different but related environments and show that the information transfer can accelerate learning
Healthcare Voice AI Assistants: Factors Influencing Trust and Intention to Use
AI assistants such as Alexa, Google Assistant, and Siri, are making their way
into the healthcare sector, offering a convenient way for users to access
different healthcare services. Trust is a vital factor in the uptake of
healthcare services, but the factors affecting trust in voice assistants used
for healthcare are under-explored and this specialist domain introduces
additional requirements. This study explores the effects of different
functional, personal, and risk factors on trust in and adoption of healthcare
voice AI assistants (HVAs), generating a partial least squares structural model
from a survey of 300 voice assistant users. Our results indicate that trust in
HVAs can be significantly explained by functional factors (usefulness, content
credibility, quality of service relative to a healthcare professional),
together with security, and privacy risks and personal stance in technology. We
also discuss differences in terms of trust between HVAs and general-purpose
voice assistants as well as implications that are unique to HVAs.Comment: 37 pages. This is a preprint of the paper accepted for the 27th ACM
Conference on Computer-Supported Cooperative Work and Social Computing
(CSCW'24
Recommended from our members
Tackling Credit Assignment Using Memory and Multilevel Optimization for Multiagent Reinforcement Learning
There is growing commercial interest in the use of multiagent systems in real world applications. Some examples include inventory management in warehouses, smart homes, planetary exploration, search and rescue, air-traffic management and autonomous transportation systems. However, multiagent coordination is an extremely challenging problem. First, information relevant for coordination is often distributed across the team members, and fragmented amongst each agent's observation histories (past states). Second, the coordination objective is often sparse and noisy from the perspective of an agent. Designing general mechanisms of generating agent-specific reward functions that incentivizes an agent to collaborate towards the shared global objective is extremely difficult. From a learning perspective, both difficulties can be linked to the difficulty of credit assignment - the process of accurately associating rewards with actions.
The primary contribution of this dissertation is to tackle credit assignment in multiagent systems in order to enable better multiagent coordination. First we leverage memory as a tool in enabling better credit assignment by facilitating associations between rewards and actions separated across time. We achieve this by introducing Modular Memory Units (MMU), a memory-augmented neural architecture that can reliably retain and propagate information over an extended period of time. We then use MMU to augment individual agents' policies in solving dynamic tasks that require adaptive behavior from a distributed multiagent team. We also introduce Distributed MMU (DMMU) which uses memory as a shared knowledge base across a team of distributed agents to enable distributed one-shot decision making.
Switching our attention from the agent to the learning algorithm, we then introduce Evolutionary Reinforcement Learning (ERL), a multilevel optimization framework that blends the strength of policy gradients and evolutionary algorithms to improve learning. We further extend the ERL framework to introduce Collaborative ERL (CERL) which employs a collection of policy gradient learners (portfolio), each optimizing over varying resolution of the same underlying task. This leads to a diverse set of policies that are able to reach diverse regions within the solution space. Results in a range of continuous control benchmarks demonstrate that ERL and CERL significantly outperform their composite learners while remaining overall more sample-efficient.
Finally, we introduce Multiagent ERL (MERL), a hybrid algorithm that leverages the multilevel optimization framework of ERL to enable improved multiagent coordination without requiring explicit alignment between local and global reward functions. MERL uses fast, policy-gradient based learning for each agent by utilizing their dense local rewards. Concurrently, evolution is used to recruit agents into a team by directly optimizing the sparser global objective. Experiments in multiagent coordination benchmarks demonstrate that MERL's integrated approach significantly outperforms the state-of-the-art multiagent policy-gradient algorithms