29,293 research outputs found
Computing the Value of Computation for Planning
An intelligent agent performs actions in order to achieve its goals. Such
actions can either be externally directed, such as opening a door, or
internally directed, such as writing data to a memory location or strengthening
a synaptic connection. Some internal actions, to which we refer as
computations, potentially help the agent choose better actions. Considering
that (external) actions and computations might draw upon the same resources,
such as time and energy, deciding when to act or compute, as well as what to
compute, are detrimental to the performance of an agent.
In an environment that provides rewards depending on an agent's behavior, an
action's value is typically defined as the sum of expected long-term rewards
succeeding the action (itself a complex quantity that depends on what the agent
goes on to do after the action in question). However, defining the value of a
computation is not as straightforward, as computations are only valuable in a
higher order way, through the alteration of actions.
This thesis offers a principled way of computing the value of a computation
in a planning setting formalized as a Markov decision process. We present two
different definitions of computation values: static and dynamic. They address
two extreme cases of the computation budget: affording calculation of zero or
infinitely many steps in the future. We show that these values have desirable
properties, such as temporal consistency and asymptotic convergence.
Furthermore, we propose methods for efficiently computing and approximating
the static and dynamic computation values. We describe a sense in which the
policies that greedily maximize these values can be optimal. We utilize these
principles to construct Monte Carlo tree search algorithms that outperform most
of the state-of-the-art in terms of finding higher quality actions given the
same simulation resources
Social Navigation Planning Based on People's Awareness of Robots
When mobile robots maneuver near people, they run the risk of rudely blocking
their paths; but not all people behave the same around robots. People that have
not noticed the robot are the most difficult to predict. This paper
investigates how mobile robots can generate acceptable paths in dynamic
environments by predicting human behavior. Here, human behavior may include
both physical and mental behavior, we focus on the latter. We introduce a
simple safe interaction model: when a human seems unaware of the robot, it
should avoid going too close. In this study, people around robots are detected
and tracked using sensor fusion and filtering techniques. To handle
uncertainties in the dynamic environment, a Partially-Observable Markov
Decision Process Model (POMDP) is used to formulate a navigation planning
problem in the shared environment. People's awareness of robots is inferred and
included as a state and reward model in the POMDP. The proposed planner enables
a robot to change its navigation plan based on its perception of each person's
robot-awareness. As far as we can tell, this is a new capability. We conduct
simulation and experiments using the Toyota Human Support Robot (HSR) to
validate our approach. We demonstrate that the proposed framework is capable of
running in real-time.Comment: 8pages, 7 figure
Robust Adversarial Reinforcement Learning
Deep neural networks coupled with fast simulation and improved computation
have led to recent successes in the field of reinforcement learning (RL).
However, most current RL-based approaches fail to generalize since: (a) the gap
between simulation and real world is so large that policy-learning approaches
fail to transfer; (b) even if policy learning is done in real world, the data
scarcity leads to failed generalization from training to test scenarios (e.g.,
due to different friction or object masses). Inspired from H-infinity control
methods, we note that both modeling errors and differences in training and test
scenarios can be viewed as extra forces/disturbances in the system. This paper
proposes the idea of robust adversarial reinforcement learning (RARL), where we
train an agent to operate in the presence of a destabilizing adversary that
applies disturbance forces to the system. The jointly trained adversary is
reinforced -- that is, it learns an optimal destabilization policy. We
formulate the policy learning as a zero-sum, minimax objective function.
Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah,
Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a)
improves training stability; (b) is robust to differences in training/test
conditions; and c) outperform the baseline even in the absence of the
adversary.Comment: 10 page
Computer Algebra Methods in Control Systems
As dynamic and control systems become more complex, relying purely on
numerical computations for systems analysis and design might become extremely
expensive or totally infeasible. Computer algebra can act as an enabler for
analysis and design of such complex systems. It also provides means for
characterization of all solutions and studying them before realizing a
particular solution. This note provides a brief survey on some of the
applications of symbolic computations in control systems analysis and design.Comment: 10 page
Learning and Reasoning with Action-Related Places for Robust Mobile Manipulation
We propose the concept of Action-Related Place (ARPlace) as a powerful and
flexible representation of task-related place in the context of mobile
manipulation. ARPlace represents robot base locations not as a single position,
but rather as a collection of positions, each with an associated probability
that the manipulation action will succeed when located there. ARPlaces are
generated using a predictive model that is acquired through experience-based
learning, and take into account the uncertainty the robot has about its own
location and the location of the object to be manipulated.
When executing the task, rather than choosing one specific goal position
based only on the initial knowledge about the task context, the robot
instantiates an ARPlace, and bases its decisions on this ARPlace, which is
updated as new information about the task becomes available. To show the
advantages of this least-commitment approach, we present a transformational
planner that reasons about ARPlaces in order to optimize symbolic plans. Our
empirical evaluation demonstrates that using ARPlaces leads to more robust and
efficient mobile manipulation in the face of state estimation uncertainty on
our simulated robot
Thompson Sampling for Dynamic Pricing
In this paper we apply active learning algorithms for dynamic pricing in a
prominent e-commerce website. Dynamic pricing involves changing the price of
items on a regular basis, and uses the feedback from the pricing decisions to
update prices of the items. Most popular approaches to dynamic pricing use a
passive learning approach, where the algorithm uses historical data to learn
various parameters of the pricing problem, and uses the updated parameters to
generate a new set of prices. We show that one can use active learning
algorithms such as Thompson sampling to more efficiently learn the underlying
parameters in a pricing problem. We apply our algorithms to a real e-commerce
system and show that the algorithms indeed improve revenue compared to pricing
algorithms that use passive learning
Accelerated Magnetic Resonance Thermometry in Presence of Uncertainties
An accelerated model-based information theoretic approach is presented to
perform the task of Magnetic Resonance (MR) thermal image reconstruction from a
limited number of observed samples on k-space. The key idea of the proposed
approach is to utilize information theoretic techniques to optimally detect
samples of k-space that are information rich with respect to a model of the
thermal data acquisition. These highly informative k-space samples are then
used to refine the mathematical model and reconstruct the image. The
information theoretic reconstruction is demonstrated retrospectively in data
acquired during MR guided Laser Induced Thermal Therapy (MRgLITT) procedures.
The approach demonstrates that locations of high-information content with
respect to a model based reconstruction of MR thermometry may be quantitatively
identified. The predicted locations of high-information content are sorted and
retrospectively extracted from the fully sampled k-space measurements data set.
The effect of interactively increasing the predicted number of data points used
in the subsampled reconstruction is quantified using the L2-norm of the
distance between the subsampled and fully sampled reconstruction. Performance
of the proposed approach is also compared with clinically available subsampling
techniques (rectilinear subsampling and variable-density Poisson disk
undersampling). It is shown that the proposed subsampling scheme results in
accurate reconstructions using small fraction of k-space points and suggest
that the reconstruction technique may be useful in improving the efficiency of
the thermometry data temporal resolution.Comment: 29 pages, 25 figure
Stochastic Multi-objective Optimization on a Budget: Application to multi-pass wire drawing with quantified uncertainties
Design optimization of engineering systems with multiple competing objectives
is a painstakingly tedious process especially when the objective functions are
expensive-to-evaluate computer codes with parametric uncertainties. The
effectiveness of the state-of-the-art techniques is greatly diminished because
they require a large number of objective evaluations, which makes them
impractical for problems of the above kind. Bayesian global optimization (BGO),
has managed to deal with these challenges in solving single-objective
optimization problems and has recently been extended to multi-objective
optimization (MOO). BGO models the objectives via probabilistic surrogates and
uses the epistemic uncertainty to define an information acquisition function
(IAF) that quantifies the merit of evaluating the objective at new designs.
This iterative data acquisition process continues until a stopping criterion is
met. The most commonly used IAF for MOO is the expected improvement over the
dominated hypervolume (EIHV) which in its original form is unable to deal with
parametric uncertainties or measurement noise. In this work, we provide a
systematic reformulation of EIHV to deal with stochastic MOO problems. The
primary contribution of this paper lies in being able to filter out the noise
and reformulate the EIHV without having to observe or estimate the stochastic
parameters. An addendum of the probabilistic nature of our methodology is that
it enables us to characterize our confidence about the predicted Pareto front.
We verify and validate the proposed methodology by applying it to synthetic
test problems with known solutions. We demonstrate our approach on an
industrial problem of die pass design for a steel wire drawing process.Comment: 19 pages, 14 figure
Optimization under Uncertainty in the Era of Big Data and Deep Learning: When Machine Learning Meets Mathematical Programming
This paper reviews recent advances in the field of optimization under
uncertainty via a modern data lens, highlights key research challenges and
promise of data-driven optimization that organically integrates machine
learning and mathematical programming for decision-making under uncertainty,
and identifies potential research opportunities. A brief review of classical
mathematical programming techniques for hedging against uncertainty is first
presented, along with their wide spectrum of applications in Process Systems
Engineering. A comprehensive review and classification of the relevant
publications on data-driven distributionally robust optimization, data-driven
chance constrained program, data-driven robust optimization, and data-driven
scenario-based optimization is then presented. This paper also identifies
fertile avenues for future research that focuses on a closed-loop data-driven
optimization framework, which allows the feedback from mathematical programming
to machine learning, as well as scenario-based optimization leveraging the
power of deep learning techniques. Perspectives on online learning-based
data-driven multistage optimization with a learning-while-optimizing scheme is
presented
Locomotion Planning through a Hybrid Bayesian Trajectory Optimization
Locomotion planning for legged systems requires reasoning about suitable
contact schedules. The contact sequence and timings constitute a hybrid
dynamical system and prescribe a subset of achievable motions. State-of-the-art
approaches cast motion planning as an optimal control problem. In order to
decrease computational complexity, one common strategy separates footstep
planning from motion optimization and plans contacts using heuristics. In this
paper, we propose to learn contact schedule selection from high-level task
descriptors using Bayesian optimization. A bi-level optimization is defined in
which a Gaussian process model predicts the performance of trajectories
generated by a motion planning nonlinear program. The agent, therefore, retains
the ability to reason about suitable contact schedules, while explicit
computation of the corresponding gradients is avoided. We delineate the
algorithm in its general form and provide results for planning single-legged
hopping. Our method is capable of learning contact schedule transitions that
align with human intuition. It performs competitively against a heuristic
baseline in predicting task appropriate contact schedules.Comment: Accepted for publication at the IEEE International Conference on
Robotics and Automation (ICRA) 201
- …