45 research outputs found
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
Size Effect in Non-equilibrium Molecular Dynamics
Direct method is commonly used to compute the thermal conductivity of a nanoscale material after molecular dynamics simulation. Direct method simply applies Fourier\u27s Law to get the value of thermal conductivity, which requires heat flux, cross sectional area and temperature gradient. A typical structure includes one heat source, one heat sink and a device region between them. Although it is usually assumed that the temperature gradient is a constant through the entire device region, the temperature profile is not linear for a material in nanoscale because phonon mean free path is comparable to the size of the whole system. Furthermore, bath length and device length can have influence on temperature profile. In this project, two methods of temperature gradient computing and the size effect of each method are discussed. Method 1 uses the center region of the device to get temperature gradient and method 2 uses the temperature difference between hot bath and cold bath divided by the device length as temperature gradient. The thermal conductivity computed from Green-Kubo method is used as a standard to test the two calculation methods and the size effect. Argon with atomic weight 40 is used as the nanoscale material because of its moderate phonon mean free path. Result shows that both method 1 and method 2 can compute the bulk-limit thermal conductivity but the necessary size conditions are different. Method 1 requires a long device and method 2 requires a long bath region
Appeal: Allow Mislabeled Samples the Chance to be Rectified in Partial Label Learning
In partial label learning (PLL), each instance is associated with a set of
candidate labels among which only one is ground-truth. The majority of the
existing works focuses on constructing robust classifiers to estimate the
labeling confidence of candidate labels in order to identify the correct one.
However, these methods usually struggle to identify and rectify mislabeled
samples. To help these mislabeled samples "appeal" for themselves and help
existing PLL methods identify and rectify mislabeled samples, in this paper, we
propose the first appeal-based PLL framework. Specifically, we introduce a
novel partner classifier and instantiate it predicated on the implicit fact
that non-candidate labels of a sample should not be assigned to it, which is
inherently accurate and has not been fully investigated in PLL. Furthermore, a
novel collaborative term is formulated to link the base classifier and the
partner one. During each stage of mutual supervision, both classifiers will
blur each other's predictions through a blurring mechanism to prevent
overconfidence in a specific label. Extensive experiments demonstrate that the
appeal and disambiguation ability of several well-established stand-alone and
deep-learning based PLL approaches can be significantly improved by coupling
with this learning paradigm.Comment: Under review. An extended version of 2024 AAAI oral paper "Partial
Label Learning with a Partner
Symmetry-Aware Robot Design with Structured Subgroups
Robot design aims at learning to create robots that can be easily controlled
and perform tasks efficiently. Previous works on robot design have proven its
ability to generate robots for various tasks. However, these works searched the
robots directly from the vast design space and ignored common structures,
resulting in abnormal robots and poor performance. To tackle this problem, we
propose a Symmetry-Aware Robot Design (SARD) framework that exploits the
structure of the design space by incorporating symmetry searching into the
robot design process. Specifically, we represent symmetries with the subgroups
of the dihedral group and search for the optimal symmetry in structured
subgroups. Then robots are designed under the searched symmetry. In this way,
SARD can design efficient symmetric robots while covering the original design
space, which is theoretically analyzed. We further empirically evaluate SARD on
various tasks, and the results show its superior efficiency and
generalizability.Comment: The Fortieth International Conference on Machine Learning (ICML 2023
Low-Rank Modular Reinforcement Learning via Muscle Synergy
Modular Reinforcement Learning (RL) decentralizes the control of multi-joint
robots by learning policies for each actuator. Previous work on modular RL has
proven its ability to control morphologically different agents with a shared
actuator policy. However, with the increase in the Degree of Freedom (DoF) of
robots, training a morphology-generalizable modular controller becomes
exponentially difficult. Motivated by the way the human central nervous system
controls numerous muscles, we propose a Synergy-Oriented LeARning (SOLAR)
framework that exploits the redundant nature of DoF in robot control. Actuators
are grouped into synergies by an unsupervised learning method, and a synergy
action is learned to control multiple actuators in synchrony. In this way, we
achieve a low-rank control at the synergy level. We extensively evaluate our
method on a variety of robot morphologies, and the results show its superior
efficiency and generalizability, especially on robots with a large DoF like
Humanoids++ and UNIMALs.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS
2022
Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
Recent offline meta-reinforcement learning (meta-RL) methods typically
utilize task-dependent behavior policies (e.g., training RL agents on each
individual task) to collect a multi-task dataset. However, these methods always
require extra information for fast adaptation, such as offline context for
testing tasks. To address this problem, we first formally characterize a unique
challenge in offline meta-RL: transition-reward distribution shift between
offline datasets and online adaptation. Our theory finds that
out-of-distribution adaptation episodes may lead to unreliable policy
evaluation and that online adaptation with in-distribution episodes can ensure
adaptation performance guarantee. Based on these theoretical insights, we
propose a novel adaptation framework, called In-Distribution online Adaptation
with uncertainty Quantification (IDAQ), which generates in-distribution context
using a given uncertainty quantification and performs effective task belief
inference to address new tasks. We find a return-based uncertainty
quantification for IDAQ that performs effectively. Experiments show that IDAQ
achieves state-of-the-art performance on the Meta-World ML1 benchmark compared
to baselines with/without offline adaptation
Self-Organized Polynomial-Time Coordination Graphs
Coordination graph is a promising approach to model agent collaboration in
multi-agent reinforcement learning. It conducts a graph-based value
factorization and induces explicit coordination among agents to complete
complicated tasks. However, one critical challenge in this paradigm is the
complexity of greedy action selection with respect to the factorized values. It
refers to the decentralized constraint optimization problem (DCOP), which and
whose constant-ratio approximation are NP-hard problems. To bypass this
systematic hardness, this paper proposes a novel method, named Self-Organized
Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph
classes to guarantee the accuracy and the computational efficiency of
collaborated action selection. SOP-CG employs dynamic graph topology to ensure
sufficient value function expressiveness. The graph selection is unified into
an end-to-end learning paradigm. In experiments, we show that our approach
learns succinct and well-adapted graph topologies, induces effective
coordination, and improves performance across a variety of cooperative
multi-agent tasks