336 research outputs found

    Kernelizing LSPE λ

    Get PDF
    We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the ‘kernelization’ of model-free LSPE(λ). The ‘kernelization’ is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this

    Deterministic Policy Gradient Algorithms

    Get PDF
    International audienceIn this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic pol- icy gradient has a particularly appealing form: it is the expected gradient of the action-value func- tion. This simple form means that the deter- ministic policy gradient can be estimated much more efficiently than the usual stochastic pol- icy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counter- parts in high-dimensional action spaces

    Data-Driven Methods Applied to Soft Robot Modeling and Control: A Review

    Get PDF
    Soft robots show compliance and have infinite degrees of freedom. Thanks to these properties, such robots can be leveraged for surgery, rehabilitation, biomimetics, unstructured environment exploring, and industrial grippers. In this case, they attract scholars from a variety of areas. However, nonlinearity and hysteresis effects also bring a burden to robot modeling. Moreover, following their flexibility and adaptation, soft robot control is more challenging than rigid robot control. In order to model and control soft robots, a large number of data-driven methods are utilized in pairs or separately. This review first briefly introduces two foundations for data-driven approaches, which are physical models and the Jacobian matrix, then summarizes three kinds of data-driven approaches, which are statistical method, neural network, and reinforcement learning. This review compares the modeling and controller features, e.g., model dynamics, data requirement, and target task, within and among these categories. Finally, we summarize the features of each method. A discussion about the advantages and limitations of the existing modeling and control approaches is presented, and we forecast the future of data-driven approaches in soft robots. A website (https://sites.google.com/view/23zcb) is built for this review and will be updated frequently. Note to Practitioners —This work is motivated by the need for a review introducing soft robot modeling and control methods in parallel. Modeling and control play significant roles in robot research, and they are challenging especially for soft robots. The nonlinear and complex deformation of such robots necessitates specific modeling and control approaches. We introduce the state-of-the-art data-driven methods and survey three approaches widely utilized. This review also compares the performance of these methods, considering some important features like data amount requirement, control frequency, and target task. The features of each approach are summarized, and we discuss the possible future of this area

    Novelty-assisted Interactive Evolution Of Control Behaviors

    Get PDF
    The field of evolutionary computation is inspired by the achievements of natural evolution, in which there is no final objective. Yet the pursuit of objectives is ubiquitous in simulated evolution because evolutionary algorithms that can consistently achieve established benchmarks are lauded as successful, thus reinforcing this paradigm. A significant problem is that such objective approaches assume that intermediate stepping stones will increasingly resemble the final objective when in fact they often do not. The consequence is that while solutions may exist, searching for such objectives may not discover them. This problem with objectives is demonstrated through an experiment in this dissertation that compares how images discovered serendipitously during interactive evolution in an online system called Picbreeder cannot be rediscovered when they become the final objective of the very same algorithm that originally evolved them. This negative result demonstrates that pursuing an objective limits evolution by selecting offspring only based on the final objective. Furthermore, even when high fitness is achieved, the experimental results suggest that the resulting solutions are typically brittle, piecewise representations that only perform well by exploiting idiosyncratic features in the target. In response to this problem, the dissertation next highlights the importance of leveraging human insight during search as an alternative to articulating explicit objectives. In particular, a new approach called novelty-assisted interactive evolutionary computation (NA-IEC) combines human intuition with a method called novelty search for the first time to facilitate the serendipitous discovery of agent behaviors. iii In this approach, the human user directs evolution by selecting what is interesting from the on-screen population of behaviors. However, unlike in typical IEC, the user can then request that the next generation be filled with novel descendants, as opposed to only the direct descendants of typical IEC. The result of such an approach, unconstrained by a priori objectives, is that it traverses key stepping stones that ultimately accumulate meaningful domain knowledge. To establishes this new evolutionary approach based on the serendipitous discovery of key stepping stones during evolution, this dissertation consists of four key contributions: (1) The first contribution establishes the deleterious effects of a priori objectives on evolution. The second (2) introduces the NA-IEC approach as an alternative to traditional objective-based approaches. The third (3) is a proof-of-concept that demonstrates how combining human insight with novelty search finds solutions significantly faster and at lower genomic complexities than fully-automated processes, including pure novelty search, suggesting an important role for human users in the search for solutions. Finally, (4) the NA-IEC approach is applied in a challenge domain wherein leveraging human intuition and domain knowledge accelerates the evolution of solutions for the nontrivial octopus-arm control task. The culmination of these contributions demonstrates the importance of incorporating human insights into simulated evolution as a means to discovering better solutions more rapidly than traditional approaches
    • 

    corecore