2,036 research outputs found
Learning Stable Koopman Models for Identification and Control of Dynamical Systems
Learning models of dynamical systems from data is a widely-studied problem in control theory and machine learning. One recent approach for modelling nonlinear systems considers the class of Koopman models, which embeds the nonlinear dynamics in a higher-dimensional linear subspace. Learning a Koopman embedding would allow for the analysis and control of nonlinear systems using tools from linear systems theory. Many recent methods have been proposed for data-driven learning of such Koopman embeddings, but most of these methods do not consider the stability of the Koopman model.
Stability is an important and desirable property for models of dynamical systems. Unstable models tend to be non-robust to input perturbations and can produce unbounded outputs, which are both undesirable when the model is used for prediction and control. In addition, recent work has shown that stability guarantees may act as a regularizer for model fitting. As such, a natural direction would be to construct Koopman models with inherent stability guarantees.
Two new classes of Koopman models are proposed that bridge the gap between Koopman-based methods and learning stable nonlinear models. The first model class is guaranteed to be stable, while the second is guaranteed to be stabilizable with an explicit stabilizing controller that renders the model stable in closed-loop. Furthermore, these models are unconstrained in their parameter sets, thereby enabling efficient optimization via gradient-based methods. Theoretical connections between the stability of Koopman models and forms of nonlinear stability such as contraction are established. To demonstrate the effect of the stability guarantees, the stable Koopman model is applied to a system identification problem, while the stabilizable model is applied to an imitation learning problem. Experimental results show empirically that the proposed models achieve better performance over prior methods without stability guarantees
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
A prominent challenge of offline reinforcement learning (RL) is the issue of
hidden confounding: unobserved variables may influence both the actions taken
by the agent and the observed outcomes. Hidden confounding can compromise the
validity of any causal conclusion drawn from data and presents a major obstacle
to effective offline RL. In the present paper, we tackle the problem of hidden
confounding in the nonidentifiable setting. We propose a definition of
uncertainty due to hidden confounding bias, termed delphic uncertainty, which
uses variation over world models compatible with the observations, and
differentiate it from the well-known epistemic and aleatoric uncertainties. We
derive a practical method for estimating the three types of uncertainties, and
construct a pessimistic offline RL algorithm to account for them. Our method
does not assume identifiability of the unobserved confounders, and attempts to
reduce the amount of confounding bias. We demonstrate through extensive
experiments and ablations the efficacy of our approach on a sepsis management
benchmark, as well as on electronic health records. Our results suggest that
nonidentifiable hidden confounding bias can be mitigated to improve offline RL
solutions in practice
TACO: Learning Task Decomposition via Temporal Alignment for Control
Many advanced Learning from Demonstration (LfD) methods consider the
decomposition of complex, real-world tasks into simpler sub-tasks. By reusing
the corresponding sub-policies within and between tasks, they provide training
data for each policy from different high-level tasks and compose them to
perform novel ones. Existing approaches to modular LfD focus either on learning
a single high-level task or depend on domain knowledge and temporal
segmentation. In contrast, we propose a weakly supervised, domain-agnostic
approach based on task sketches, which include only the sequence of sub-tasks
performed in each demonstration. Our approach simultaneously aligns the
sketches with the observed demonstrations and learns the required sub-policies.
This improves generalisation in comparison to separate optimisation procedures.
We evaluate the approach on multiple domains, including a simulated 3D robot
arm control task using purely image-based observations. The results show that
our approach performs commensurately with fully supervised approaches, while
requiring significantly less annotation effort.Comment: 12 Pages. Published at ICML 201
Learning multi-stage tasks with one demonstration via self-replay
In this work, we introduce a novel method to learn everyday-like multistage tasks from a single human demonstration, without requiring any prior object knowledge. Inspired by the recent Coarse-to-Fine Imitation Learning method, we model imitation learning as a learned object reaching phase followed by an openloop replay of the demonstrator’s actions. We build upon this for multi-stage tasks where, following the human demonstration, the robot can autonomously collect image data for the entire multi-stage task, by reaching the next object in the sequence and then replaying the demonstration, and then repeating in a loop for all stages of the task. We evaluate with real-world experiments on a set of everydaylike multi-stage tasks, which we show that our method can solve from a single demonstration. Videos and supplementary material can be found at this webpage
Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning
In cooperative multi-agent reinforcement learning, centralized training and
decentralized execution (CTDE) has achieved remarkable success. Individual
Global Max (IGM) decomposition, which is an important element of CTDE, measures
the consistency between local and joint policies. The majority of IGM-based
research focuses on how to establish this consistent relationship, but little
attention has been paid to examining IGM's potential flaws. In this work, we
reveal that the IGM condition is a lossy decomposition, and the error of lossy
decomposition will accumulated in hypernetwork-based methods. To address the
above issue, we propose to adopt an imitation learning strategy to separate the
lossy decomposition from Bellman iterations, thereby avoiding error
accumulation. The proposed strategy is theoretically proved and empirically
verified on the StarCraft Multi-Agent Challenge benchmark problem with zero
sight view. The results also confirm that the proposed method outperforms
state-of-the-art IGM-based approaches.Comment: Accept at NeurIPS 202
Information-Theoretic Policy Extraction from Partial Observations
We investigate the problem of extracting a control policy from a single or
multiple partial observation sequences. Therefore we cast the problem as a
Controlled Hidden Markov Model. We then sketch two information-theoretic
approaches to extract a policy which we refer to as A Posterior Control
Distributions. The performance of both methods is investigated and compared
empirically on a linear tracking problem
- …