4,940 research outputs found

    A constrained control-planning strategy for redundant manipulators

    Full text link
    This paper presents an interconnected control-planning strategy for redundant manipulators, subject to system and environmental constraints. The method incorporates low-level control characteristics and high-level planning components into a robust strategy for manipulators acting in complex environments, subject to joint limits. This strategy is formulated using an adaptive control rule, the estimated dynamic model of the robotic system and the nullspace of the linearized constraints. A path is generated that takes into account the capabilities of the platform. The proposed method is computationally efficient, enabling its implementation on a real multi-body robotic system. Through experimental results with a 7 DOF manipulator, we demonstrate the performance of the method in real-world scenarios

    Control-Theoretic, Mission-Driven, Optimization Techniques for Wireless Sensor Networks

    Get PDF

    Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration

    Full text link
    Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes or, if traceability links between code and tests are not available. This paper introduces Retecs, a new method for automatically learning test case selection and prioritization in CI with the goal to minimize the round-trip time between code commits and developer feedback on failed test cases. The Retecs method uses reinforcement learning to select and prioritize test cases according to their duration, previous last execution and failure history. In a constantly changing environment, where new test cases are created and obsolete test cases are deleted, the Retecs method learns to prioritize error-prone test cases higher under guidance of a reward function and by observing previous CI cycles. By applying Retecs on data extracted from three industrial case studies, we show for the first time that reinforcement learning enables fruitful automatic adaptive test case selection and prioritization in CI and regression testing.Comment: Spieker, H., Gotlieb, A., Marijan, D., & Mossige, M. (2017). Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. In Proceedings of 26th International Symposium on Software Testing and Analysis (ISSTA'17) (pp. 12--22). AC

    Reinforcement Learning

    Full text link
    Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e.g., board games, video games or autonomous vehicles. In such problems, an agent faces a sequential decision-making problem where, at every time step, it observes its state, performs an action, receives a reward and moves to a new state. An RL agent learns by trial and error a good policy (or controller) based on observations and numeric reward feedback on the previously performed action. In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy. The first one, which is value-based, consists in estimating the value of an optimal policy, value from which a policy can be recovered, while the other, called policy search, directly works in a policy space. Actor-critic methods can be seen as a policy search technique where the policy value that is learned guides the policy improvement. Besides, we give an overview of some extensions of the standard RL framework, notably when risk-averse behavior needs to be taken into account or when rewards are not available or not known.Comment: Chapter in "A Guided Tour of Artificial Intelligence Research", Springe

    Nullspace Structure in Model Predictive Control

    Full text link
    Robotic tasks can be accomplished by exploiting different forms of redundancies. This work focuses on planning redundancy within Model Predictive Control (MPC) in which several paths can be considered within the MPC time horizon. We present the nullspace structure in MPC with a quadratic approximation of the cost and a linearization of the dynamics. We exploit the low rank structure of the precision matrices used in MPC (encapsulating spatiotemporal information) to perform hierarchical task planning, and show how nullspace computation can be treated as a fusion problem (computed with a product of Gaussian experts). We illustrate the approach using proof-of-concept examples with point mass objects and simulated robotics applications

    ControlIt! - A Software Framework for Whole-Body Operational Space Control

    Full text link
    Whole Body Operational Space Control (WBOSC) is a pioneering algorithm in the field of human-centered Whole-Body Control (WBC). It enables floating-base highly-redundant robots to achieve unified motion/force control of one or more operational space objectives while adhering to physical constraints. Limited studies exist on the software architecture and APIs that enable WBOSC to perform and be integrated into a larger system. In this paper we address this by presenting ControlIt!, a new open-source software framework for WBOSC. Unlike previous implementations, ControlIt! is multi-threaded to increase servo frequencies on standard PC hardware. A new parameter binding mechanism enables tight integration between ControlIt! and external processes via an extensible set of transport protocols. To support a new robot, only two plugins and a URDF model needs to be provided --- the rest of ControlIt! remains unchanged. New WBC primitives can be added by writing a Task or Constraint plugin. ControlIt!'s capabilities are demonstrated on Dreamer, a 16-DOF torque controlled humanoid upper body robot containing both series elastic and co-actuated joints, and using it to perform a product disassembly task. Using this testbed, we show that ControlIt! can achieve average servo latencies of about 0.5ms when configured with two Cartesian position tasks, two orientation tasks, and a lower priority posture task. This is significantly higher than the 5ms that was achieved using UTA-WBC, the prototype implementation of WBOSC that is both application and platform-specific. Variations in the product's position is handled by updating the goal of the Cartesian position task. ControlIt!'s source code is released under an LGPL license and we hope it will be adopted and maintained by the WBC community for the long term as a platform for WBC development and integration

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    Deep learning for video game playing

    Get PDF
    In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards

    Ergodic Exploration of Distributed Information

    Full text link
    This paper presents an active search trajectory synthesis technique for autonomous mobile robots with nonlinear measurements and dynamics. The presented approach uses the ergodicity of a planned trajectory with respect to an expected information density map to close the loop during search. The ergodic control algorithm does not rely on discretization of the search or action spaces, and is well posed for coverage with respect to the expected information density whether the information is diffuse or localized, thus trading off between exploration and exploitation in a single objective function. As a demonstration, we use a robotic electrolocation platform to estimate location and size parameters describing static targets in an underwater environment. Our results demonstrate that the ergodic exploration of distributed information (EEDI) algorithm outperforms commonly used information-oriented controllers, particularly when distractions are present.Comment: 17 page

    Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

    Full text link
    We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.Comment: 39 pages, 5 figure
    • …
    corecore