14 research outputs found
Learning Pneumatic Non-Prehensile Manipulation with a Mobile Blower
We investigate pneumatic non-prehensile manipulation (i.e., blowing) as a
means of efficiently moving scattered objects into a target receptacle. Due to
the chaotic nature of aerodynamic forces, a blowing controller must (i)
continually adapt to unexpected changes from its actions, (ii) maintain
fine-grained control, since the slightest misstep can result in large
unintended consequences (e.g., scatter objects already in a pile), and (iii)
infer long-range plans (e.g., move the robot to strategic blowing locations).
We tackle these challenges in the context of deep reinforcement learning,
introducing a multi-frequency version of the spatial action maps framework.
This allows for efficient learning of vision-based policies that effectively
combine high-level planning and low-level closed-loop control for dynamic
mobile manipulation. Experiments show that our system learns efficient
behaviors for the task, demonstrating in particular that blowing achieves
better downstream performance than pushing, and that our policies improve
performance over baselines. Moreover, we show that our system naturally
encourages emergent specialization between the different subpolicies spanning
low-level fine-grained control and high-level planning. On a real mobile robot
equipped with a miniature air blower, we show that our simulation-trained
policies transfer well to a real environment and can generalize to novel
objects.Comment: Project page: https://learning-dynamic-manipulation.cs.princeton.ed
Projections for Approximate Policy Iteration Algorithms
Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms
Projections for Approximate Policy Iteration Algorithms
Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms
Generalization and Transferability in Reinforcement Learning
Reinforcement learning has proven capable of extending the applicability of machine learning to domains in which
knowledge cannot be acquired from labeled examples but only via trial-and-error. Being able to solve problems with such
characteristics is a crucial requirement for autonomous agents that can accomplish tasks without human intervention.
However, most reinforcement learning algorithms are designed to solve exactly one task, not offering means to systematically
reuse previous knowledge acquired in other problems. Motivated by insights from homotopic continuation methods,
in this work we investigate approaches based on optimization- and concurrent systems theory to gain an understanding
of conceptual and technical challenges of knowledge transfer in reinforcement learning domains. Building upon these
findings, we present an algorithm based on contextual relative entropy policy search that allows an agent to generate
a structured sequence of learning tasks that guide its learning towards a target distribution of tasks by giving it control
over an otherwise hidden context distribution. The presented algorithm is evaluated on a number of robotic tasks, in
which a desired system state needs to be reached, demonstrating that the proposed learning scheme helps to increase
and stabilize learning performance
A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning
Across machine learning, the use of curricula has shown strong empirical
potential to improve learning from data by avoiding local optima of training
objectives. For reinforcement learning (RL), curricula are especially
interesting, as the underlying optimization has a strong tendency to get stuck
in local optima due to the exploration-exploitation trade-off. Recently, a
number of approaches for an automatic generation of curricula for RL have been
shown to increase performance while requiring less expert knowledge compared to
manually designed curricula. However, these approaches are seldomly
investigated from a theoretical perspective, preventing a deeper understanding
of their mechanics. In this paper, we present an approach for automated
curriculum generation in RL with a clear theoretical underpinning. More
precisely, we formalize the well-known self-paced learning paradigm as inducing
a distribution over training tasks, which trades off between task complexity
and the objective to match a desired task distribution. Experiments show that
training on this induced distribution helps to avoid poor local optima across
RL algorithms in different tasks with uninformative rewards and challenging
exploration requirements
Statistical Machine Learning for Modeling and Control of Stochastic Structured Systems
Machine learning and its various applications have driven innovation in robotics, synthetic perception, and data analytics. The last decade especially has experienced an explosion in interest in the research and development of artificial intelligence with successful adoption and deployment in some domains. A significant force behind these advances has been an abundance of data and the evolution of simple computational models and tools with a capacity to scale up to massive learning automata. Monolithic neural networks with billions of parameters that rely on automatic differentiation are a prime example of the significant role efficient computation has had on supercharging the ability of well-established representations to extract intelligent patterns from unstructured data.
Nonetheless, despite the strides taken in the digital domains of vision and natural language processing, applications of optimal control and robotics significantly trail behind and have not been able to capitalize as much on the latest trends of machine learning. This discrepancy can be explained by the limited transferability of learning concepts that rely on full differentiability to the heavily structured physical and human interaction environments, not to mention the substantial cost of data generation on real physical systems. Therefore, these factors severely limit the application scope of loosely-structured over-parameterized data-crunching machines in the mechanical realm of robot learning and control.
This thesis investigates modeling paradigms of hierarchical and switching systems to tackle some of the previously highlighted issues. This research direction is motivated by insights into universal function approximation via local cooperating units and the promise of inherently regularized representations through explicit structural design. Moreover, we explore ideas from robust optimization that address model mismatch issues in statistical models and outline how related methods may be used to improve the tractability of state filtering in stochastic hybrid systems.
In Chapter 2, we consider hierarchical modeling for general regression problems. The presented approach is a generative probabilistic interpretation of local regression techniques that approximate nonlinear functions through a set of local linear or polynomial units. The number of available units is crucial in such models, as it directly balances representational power with the parametric complexity. This ambiguity is addressed by using principles from Bayesian nonparametrics to formulate flexible models that adapt their complexity to the data and can potentially encompass an infinite number of components. To learn these representations, we present two efficient variational inference techniques that scale well with data and highlight the advantages of hierarchical infinite local regression models, such as dealing with non-smooth functions, mitigating catastrophic forgetting, and enabling parameter sharing and fast predictions. Finally, we validate this approach on a set of large inverse dynamics datasets and test the learned models in real-world control scenarios.
Chapter 3 addresses discrete-continuous hybrid modeling and control for stochastic dynamical systems, which implies dealing with time-series data. In this scenario, we develop an automatic system identification technique that decomposes nonlinear systems into hybrid automata and leverages the resulting structure to learn switching feedback control via hierarchical reinforcement learning. In the process, we rely on an augmented closed-loop hidden Markov model architecture that captures time correlations over long horizons and provides a principled Bayesian inference framework for learning hybrid representations and filtering the hidden discrete states to apply control accordingly. Finally, we embed this structure explicitly into a novel hybrid relative entropy policy search algorithm that optimizes a set of local polynomial feedback controllers and value functions. We validate the overall switching-system perspective by benchmarking the open-loop predictive performance against popular black-box representations. We also provide qualitative empirical results for hybrid reinforcement learning on common nonlinear control tasks.
In Chapter 4, we attend to a general and fundamental problem in learning for control, namely robustness in data-driven stochastic optimization. The question of sensitivity has a strong priority, given the rising popularity of embedding statistical models into stochastic control frameworks. However, data from dynamical, especially mechanical, systems is often scarce due to a high extraction cost and limited coverage of the state-action space. The result is usually poor models with narrow validity and brittle control laws, particularly in an ill-posed over-parameterized learning example. We propose to robustify stochastic control by finding the worst-case distribution over the dynamics and optimizing a corresponding robust policy that minimizes the probability of catastrophic failures. We achieve this goal by formulating a two-stage iterative minimax optimization problem that finds the most pessimistic adversary in a trust region around a nominal model and uses it to optimize a robust optimal controller. We test this approach on a set of linear and nonlinear stochastic systems and supply empirical evidence of its practicality. Finally, we provide an outlook on how similar multi-stage distributional optimization techniques can be applied in approximate filtering of stochastic switching systems in order to tackle the issue of exponential explosion in state mixture components.
In summation, the individual contributions of this thesis are a collection of interconnected principles for structured and robust learning for control. Although many challenges remain ahead, this research lays a foundation for reflecting on future structured learning questions that strive to combine optimal control and statistical machine learning perspectives for the automatic decomposition and optimization of hierarchical models