Search CORE

4,940 research outputs found

A constrained control-planning strategy for redundant manipulators

Author: Barbalata Corina
Johnson-Roberson Matthew
Vasudevan Ram
Publication venue
Publication date: 09/10/2018
Field of study

This paper presents an interconnected control-planning strategy for redundant manipulators, subject to system and environmental constraints. The method incorporates low-level control characteristics and high-level planning components into a robust strategy for manipulators acting in complex environments, subject to joint limits. This strategy is formulated using an adaptive control rule, the estimated dynamic model of the robotic system and the nullspace of the linearized constraints. A path is generated that takes into account the capabilities of the platform. The proposed method is computationally efficient, enabling its implementation on a real multi-body robotic system. Through experimental results with a 7 DOF manipulator, we demonstrate the performance of the method in real-world scenarios

arXiv.org e-Print Archive

Control-Theoretic, Mission-Driven, Optimization Techniques for Wireless Sensor Networks

Author: ESWARAN Sharanya
LA PORTA Thomas
MISRA Archan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration

Author: Gotlieb Arnaud
Marijan Dusica
Mossige Morten
Spieker Helge
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/11/2018
Field of study

Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes or, if traceability links between code and tests are not available. This paper introduces Retecs, a new method for automatically learning test case selection and prioritization in CI with the goal to minimize the round-trip time between code commits and developer feedback on failed test cases. The Retecs method uses reinforcement learning to select and prioritize test cases according to their duration, previous last execution and failure history. In a constantly changing environment, where new test cases are created and obsolete test cases are deleted, the Retecs method learns to prioritize error-prone test cases higher under guidance of a reward function and by observing previous CI cycles. By applying Retecs on data extracted from three industrial case studies, we show for the first time that reinforcement learning enables fruitful automatic adaptive test case selection and prioritization in CI and regression testing.Comment: Spieker, H., Gotlieb, A., Marijan, D., & Mossige, M. (2017). Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. In Proceedings of 26th International Symposium on Software Testing and Analysis (ISSTA'17) (pp. 12--22). AC

arXiv.org e-Print Archive

Reinforcement Learning

Author: Buffet Olivier
Pietquin Olivier
Weng Paul
Publication venue
Publication date: 13/06/2020
Field of study

Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e.g., board games, video games or autonomous vehicles. In such problems, an agent faces a sequential decision-making problem where, at every time step, it observes its state, performs an action, receives a reward and moves to a new state. An RL agent learns by trial and error a good policy (or controller) based on observations and numeric reward feedback on the previously performed action. In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy. The first one, which is value-based, consists in estimating the value of an optimal policy, value from which a policy can be recovered, while the other, called policy search, directly works in a policy space. Actor-critic methods can be seen as a policy search technique where the policy value that is learned guides the policy improvement. Besides, we give an overview of some extensions of the standard RL framework, notably when risk-averse behavior needs to be taken into account or when rewards are not available or not known.Comment: Chapter in "A Guided Tour of Artificial Intelligence Research", Springe

arXiv.org e-Print Archive

Nullspace Structure in Model Predictive Control

Author: Calinon Sylvain
Girgin Hakan
Publication venue
Publication date: 23/05/2019
Field of study

Robotic tasks can be accomplished by exploiting different forms of redundancies. This work focuses on planning redundancy within Model Predictive Control (MPC) in which several paths can be considered within the MPC time horizon. We present the nullspace structure in MPC with a quadratic approximation of the cost and a linearization of the dynamics. We exploit the low rank structure of the precision matrices used in MPC (encapsulating spatiotemporal information) to perform hierarchical task planning, and show how nullspace computation can be treated as a fusion problem (computed with a product of Gaussian experts). We illustrate the approach using proof-of-concept examples with point mass objects and simulated robotics applications

arXiv.org e-Print Archive

ControlIt! - A Software Framework for Whole-Body Operational Space Control

Author: Fok C. -L.
Johnson G.
Mok A.
Sentis L.
Yamokoski J. D.
Publication venue
Publication date: 02/06/2015
Field of study

Whole Body Operational Space Control (WBOSC) is a pioneering algorithm in the field of human-centered Whole-Body Control (WBC). It enables floating-base highly-redundant robots to achieve unified motion/force control of one or more operational space objectives while adhering to physical constraints. Limited studies exist on the software architecture and APIs that enable WBOSC to perform and be integrated into a larger system. In this paper we address this by presenting ControlIt!, a new open-source software framework for WBOSC. Unlike previous implementations, ControlIt! is multi-threaded to increase servo frequencies on standard PC hardware. A new parameter binding mechanism enables tight integration between ControlIt! and external processes via an extensible set of transport protocols. To support a new robot, only two plugins and a URDF model needs to be provided --- the rest of ControlIt! remains unchanged. New WBC primitives can be added by writing a Task or Constraint plugin. ControlIt!'s capabilities are demonstrated on Dreamer, a 16-DOF torque controlled humanoid upper body robot containing both series elastic and co-actuated joints, and using it to perform a product disassembly task. Using this testbed, we show that ControlIt! can achieve average servo latencies of about 0.5ms when configured with two Cartesian position tasks, two orientation tasks, and a lower priority posture task. This is significantly higher than the 5ms that was achieved using UTA-WBC, the prototype implementation of WBOSC that is both application and platform-specific. Variations in the product's position is handled by updating the goal of the Cartesian position task. ControlIt!'s source code is released under an LGPL license and we hope it will be adopted and maintained by the WBC community for the long term as a platform for WBC development and integration

arXiv.org e-Print Archive

Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

Author: Alsheikh Mohammad Abu
Hoang Dinh Thai
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2015
Field of study

Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

arXiv.org e-Print Archive

University of Canberra Research Repository

Deep learning for video game playing

Author: Bontrager Philip
Justesen Niels
Risi Sebastian
Togelius Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards

arXiv.org e-Print Archive

Ergodic Exploration of Distributed Information

Author: MacIver Malcolm A.
Miller Lauren M.
Murphey Todd D.
Silverman Yonatan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2017
Field of study

This paper presents an active search trajectory synthesis technique for autonomous mobile robots with nonlinear measurements and dynamics. The presented approach uses the ergodicity of a planned trajectory with respect to an expected information density map to close the loop during search. The ergodic control algorithm does not rely on discretization of the search or action spaces, and is well posed for coverage with respect to the expected information density whether the information is diffuse or localized, thus trading off between exploration and exploitation in a single objective function. As a demonstration, we use a robotic electrolocation platform to estimate location and size parameters describing static targets in an underwater environment. Our results demonstrate that the ergodic exploration of distributed information (EEDI) algorithm outperforms commonly used information-oriented controllers, particularly when distractions are present.Comment: 17 page

arXiv.org e-Print Archive

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

Author: Wang Haoran
Zhou Xun Yu
Publication venue
Publication date: 04/05/2019
Field of study

We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.Comment: 39 pages, 5 figure

arXiv.org e-Print Archive