4,940 research outputs found
A constrained control-planning strategy for redundant manipulators
This paper presents an interconnected control-planning strategy for redundant
manipulators, subject to system and environmental constraints. The method
incorporates low-level control characteristics and high-level planning
components into a robust strategy for manipulators acting in complex
environments, subject to joint limits. This strategy is formulated using an
adaptive control rule, the estimated dynamic model of the robotic system and
the nullspace of the linearized constraints. A path is generated that takes
into account the capabilities of the platform. The proposed method is
computationally efficient, enabling its implementation on a real multi-body
robotic system. Through experimental results with a 7 DOF manipulator, we
demonstrate the performance of the method in real-world scenarios
Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration
Testing in Continuous Integration (CI) involves test case prioritization,
selection, and execution at each cycle. Selecting the most promising test cases
to detect bugs is hard if there are uncertainties on the impact of committed
code changes or, if traceability links between code and tests are not
available. This paper introduces Retecs, a new method for automatically
learning test case selection and prioritization in CI with the goal to minimize
the round-trip time between code commits and developer feedback on failed test
cases. The Retecs method uses reinforcement learning to select and prioritize
test cases according to their duration, previous last execution and failure
history. In a constantly changing environment, where new test cases are created
and obsolete test cases are deleted, the Retecs method learns to prioritize
error-prone test cases higher under guidance of a reward function and by
observing previous CI cycles. By applying Retecs on data extracted from three
industrial case studies, we show for the first time that reinforcement learning
enables fruitful automatic adaptive test case selection and prioritization in
CI and regression testing.Comment: Spieker, H., Gotlieb, A., Marijan, D., & Mossige, M. (2017).
Reinforcement Learning for Automatic Test Case Prioritization and Selection
in Continuous Integration. In Proceedings of 26th International Symposium on
Software Testing and Analysis (ISSTA'17) (pp. 12--22). AC
Reinforcement Learning
Reinforcement learning (RL) is a general framework for adaptive control,
which has proven to be efficient in many domains, e.g., board games, video
games or autonomous vehicles. In such problems, an agent faces a sequential
decision-making problem where, at every time step, it observes its state,
performs an action, receives a reward and moves to a new state. An RL agent
learns by trial and error a good policy (or controller) based on observations
and numeric reward feedback on the previously performed action. In this
chapter, we present the basic framework of RL and recall the two main families
of approaches that have been developed to learn a good policy. The first one,
which is value-based, consists in estimating the value of an optimal policy,
value from which a policy can be recovered, while the other, called policy
search, directly works in a policy space. Actor-critic methods can be seen as a
policy search technique where the policy value that is learned guides the
policy improvement. Besides, we give an overview of some extensions of the
standard RL framework, notably when risk-averse behavior needs to be taken into
account or when rewards are not available or not known.Comment: Chapter in "A Guided Tour of Artificial Intelligence Research",
Springe
Nullspace Structure in Model Predictive Control
Robotic tasks can be accomplished by exploiting different forms of
redundancies. This work focuses on planning redundancy within Model Predictive
Control (MPC) in which several paths can be considered within the MPC time
horizon. We present the nullspace structure in MPC with a quadratic
approximation of the cost and a linearization of the dynamics. We exploit the
low rank structure of the precision matrices used in MPC (encapsulating
spatiotemporal information) to perform hierarchical task planning, and show how
nullspace computation can be treated as a fusion problem (computed with a
product of Gaussian experts). We illustrate the approach using proof-of-concept
examples with point mass objects and simulated robotics applications
ControlIt! - A Software Framework for Whole-Body Operational Space Control
Whole Body Operational Space Control (WBOSC) is a pioneering algorithm in the
field of human-centered Whole-Body Control (WBC). It enables floating-base
highly-redundant robots to achieve unified motion/force control of one or more
operational space objectives while adhering to physical constraints. Limited
studies exist on the software architecture and APIs that enable WBOSC to
perform and be integrated into a larger system. In this paper we address this
by presenting ControlIt!, a new open-source software framework for WBOSC.
Unlike previous implementations, ControlIt! is multi-threaded to increase servo
frequencies on standard PC hardware. A new parameter binding mechanism enables
tight integration between ControlIt! and external processes via an extensible
set of transport protocols. To support a new robot, only two plugins and a URDF
model needs to be provided --- the rest of ControlIt! remains unchanged. New
WBC primitives can be added by writing a Task or Constraint plugin.
ControlIt!'s capabilities are demonstrated on Dreamer, a 16-DOF torque
controlled humanoid upper body robot containing both series elastic and
co-actuated joints, and using it to perform a product disassembly task. Using
this testbed, we show that ControlIt! can achieve average servo latencies of
about 0.5ms when configured with two Cartesian position tasks, two orientation
tasks, and a lower priority posture task. This is significantly higher than the
5ms that was achieved using UTA-WBC, the prototype implementation of WBOSC that
is both application and platform-specific. Variations in the product's position
is handled by updating the goal of the Cartesian position task. ControlIt!'s
source code is released under an LGPL license and we hope it will be adopted
and maintained by the WBC community for the long term as a platform for WBC
development and integration
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
Ergodic Exploration of Distributed Information
This paper presents an active search trajectory synthesis technique for
autonomous mobile robots with nonlinear measurements and dynamics. The
presented approach uses the ergodicity of a planned trajectory with respect to
an expected information density map to close the loop during search. The
ergodic control algorithm does not rely on discretization of the search or
action spaces, and is well posed for coverage with respect to the expected
information density whether the information is diffuse or localized, thus
trading off between exploration and exploitation in a single objective
function. As a demonstration, we use a robotic electrolocation platform to
estimate location and size parameters describing static targets in an
underwater environment. Our results demonstrate that the ergodic exploration of
distributed information (EEDI) algorithm outperforms commonly used
information-oriented controllers, particularly when distractions are present.Comment: 17 page
Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework
We approach the continuous-time mean-variance (MV) portfolio selection with
reinforcement learning (RL). The problem is to achieve the best tradeoff
between exploration and exploitation, and is formulated as an
entropy-regularized, relaxed stochastic control problem. We prove that the
optimal feedback policy for this problem must be Gaussian, with time-decaying
variance. We then establish connections between the entropy-regularized MV and
the classical MV, including the solvability equivalence and the convergence as
exploration weighting parameter decays to zero. Finally, we prove a policy
improvement theorem, based on which we devise an implementable RL algorithm. We
find that our algorithm outperforms both an adaptive control based method and a
deep neural networks based algorithm by a large margin in our simulations.Comment: 39 pages, 5 figure
- …