82 research outputs found
An Optimal Online Method of Selecting Source Policies for Reinforcement Learning
Transfer learning significantly accelerates the reinforcement learning
process by exploiting relevant knowledge from previous experiences. The problem
of optimally selecting source policies during the learning process is of great
importance yet challenging. There has been little theoretical analysis of this
problem. In this paper, we develop an optimal online method to select source
policies for reinforcement learning. This method formulates online source
policy selection as a multi-armed bandit problem and augments Q-learning with
policy reuse. We provide theoretical guarantees of the optimal selection
process and convergence to the optimal policy. In addition, we conduct
experiments on a grid-based robot navigation domain to demonstrate its
efficiency and robustness by comparing to the state-of-the-art transfer
learning method
Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design
Multi-cellular robot design aims to create robots comprised of numerous cells
that can be efficiently controlled to perform diverse tasks. Previous research
has demonstrated the ability to generate robots for various tasks, but these
approaches often optimize robots directly in the vast design space, resulting
in robots with complicated morphologies that are hard to control. In response,
this paper presents a novel coarse-to-fine method for designing multi-cellular
robots. Initially, this strategy seeks optimal coarse-grained robots and
progressively refines them. To mitigate the challenge of determining the
precise refinement juncture during the coarse-to-fine transition, we introduce
the Hyperbolic Embeddings for Robot Design (HERD) framework. HERD unifies
robots of various granularity within a shared hyperbolic space and leverages a
refined Cross-Entropy Method for optimization. This framework enables our
method to autonomously identify areas of exploration in hyperbolic space and
concentrate on regions demonstrating promise. Finally, the extensive empirical
studies on various challenging tasks sourced from EvoGym show our approach's
superior efficiency and generalization capability
On fairness in decision-making under uncertainty: Definitions, computation, and comparison
The utilitarian solution criterion, which has been extensively studied in multi-agent decision making under uncertainty, aims to maximize the sum of individual utilities. However, as the utilitarian solution often discriminates against some agents, it is not desirable for many practical applications where agents have their own interests and fairness is expected. To address this issue, this paper introduces egalitarian solution criteria for sequential decision-making under uncertainty, which are based on the maximin principle. Motivated by different application domains, we propose four maximin fairness criteria and develop corresponding algorithms for computing their optimal policies. Furthermore, we analyze the connections between these criteria and discuss and compare their characteristics
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
Fairness in Multi-Agent Sequential Decision-Making
We define a fairness solution criterion for multi-agent decision-making problems, where agents have local interests. This new criterion aims to maximize the worst performance of agents with consideration on the overall performance. We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy. This game-theoretic approach formulates this fairness optimization as a two-player, zero-sum game and employs an iterative algorithm for finding a Nash equilibrium, corresponding to an optimal fairness policy. We scale up this approach by exploiting problem structure and value function approximation. Our experiments on resource allocation problems show that this fairness criterion provides a more favorable solution than the utilitarian criterion, and that our game-theoretic approach is significantly faster than linear programming
Symmetry-Aware Robot Design with Structured Subgroups
Robot design aims at learning to create robots that can be easily controlled
and perform tasks efficiently. Previous works on robot design have proven its
ability to generate robots for various tasks. However, these works searched the
robots directly from the vast design space and ignored common structures,
resulting in abnormal robots and poor performance. To tackle this problem, we
propose a Symmetry-Aware Robot Design (SARD) framework that exploits the
structure of the design space by incorporating symmetry searching into the
robot design process. Specifically, we represent symmetries with the subgroups
of the dihedral group and search for the optimal symmetry in structured
subgroups. Then robots are designed under the searched symmetry. In this way,
SARD can design efficient symmetric robots while covering the original design
space, which is theoretically analyzed. We further empirically evaluate SARD on
various tasks, and the results show its superior efficiency and
generalizability.Comment: The Fortieth International Conference on Machine Learning (ICML 2023
- …