29,293 research outputs found

    Computing the Value of Computation for Planning

    Full text link
    An intelligent agent performs actions in order to achieve its goals. Such actions can either be externally directed, such as opening a door, or internally directed, such as writing data to a memory location or strengthening a synaptic connection. Some internal actions, to which we refer as computations, potentially help the agent choose better actions. Considering that (external) actions and computations might draw upon the same resources, such as time and energy, deciding when to act or compute, as well as what to compute, are detrimental to the performance of an agent. In an environment that provides rewards depending on an agent's behavior, an action's value is typically defined as the sum of expected long-term rewards succeeding the action (itself a complex quantity that depends on what the agent goes on to do after the action in question). However, defining the value of a computation is not as straightforward, as computations are only valuable in a higher order way, through the alteration of actions. This thesis offers a principled way of computing the value of a computation in a planning setting formalized as a Markov decision process. We present two different definitions of computation values: static and dynamic. They address two extreme cases of the computation budget: affording calculation of zero or infinitely many steps in the future. We show that these values have desirable properties, such as temporal consistency and asymptotic convergence. Furthermore, we propose methods for efficiently computing and approximating the static and dynamic computation values. We describe a sense in which the policies that greedily maximize these values can be optimal. We utilize these principles to construct Monte Carlo tree search algorithms that outperform most of the state-of-the-art in terms of finding higher quality actions given the same simulation resources

    Social Navigation Planning Based on People's Awareness of Robots

    Full text link
    When mobile robots maneuver near people, they run the risk of rudely blocking their paths; but not all people behave the same around robots. People that have not noticed the robot are the most difficult to predict. This paper investigates how mobile robots can generate acceptable paths in dynamic environments by predicting human behavior. Here, human behavior may include both physical and mental behavior, we focus on the latter. We introduce a simple safe interaction model: when a human seems unaware of the robot, it should avoid going too close. In this study, people around robots are detected and tracked using sensor fusion and filtering techniques. To handle uncertainties in the dynamic environment, a Partially-Observable Markov Decision Process Model (POMDP) is used to formulate a navigation planning problem in the shared environment. People's awareness of robots is inferred and included as a state and reward model in the POMDP. The proposed planner enables a robot to change its navigation plan based on its perception of each person's robot-awareness. As far as we can tell, this is a new capability. We conduct simulation and experiments using the Toyota Human Support Robot (HSR) to validate our approach. We demonstrate that the proposed framework is capable of running in real-time.Comment: 8pages, 7 figure

    Robust Adversarial Reinforcement Learning

    Full text link
    Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL). However, most current RL-based approaches fail to generalize since: (a) the gap between simulation and real world is so large that policy-learning approaches fail to transfer; (b) even if policy learning is done in real world, the data scarcity leads to failed generalization from training to test scenarios (e.g., due to different friction or object masses). Inspired from H-infinity control methods, we note that both modeling errors and differences in training and test scenarios can be viewed as extra forces/disturbances in the system. This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. The jointly trained adversary is reinforced -- that is, it learns an optimal destabilization policy. We formulate the policy learning as a zero-sum, minimax objective function. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary.Comment: 10 page

    Computer Algebra Methods in Control Systems

    Full text link
    As dynamic and control systems become more complex, relying purely on numerical computations for systems analysis and design might become extremely expensive or totally infeasible. Computer algebra can act as an enabler for analysis and design of such complex systems. It also provides means for characterization of all solutions and studying them before realizing a particular solution. This note provides a brief survey on some of the applications of symbolic computations in control systems analysis and design.Comment: 10 page

    Learning and Reasoning with Action-Related Places for Robust Mobile Manipulation

    Full text link
    We propose the concept of Action-Related Place (ARPlace) as a powerful and flexible representation of task-related place in the context of mobile manipulation. ARPlace represents robot base locations not as a single position, but rather as a collection of positions, each with an associated probability that the manipulation action will succeed when located there. ARPlaces are generated using a predictive model that is acquired through experience-based learning, and take into account the uncertainty the robot has about its own location and the location of the object to be manipulated. When executing the task, rather than choosing one specific goal position based only on the initial knowledge about the task context, the robot instantiates an ARPlace, and bases its decisions on this ARPlace, which is updated as new information about the task becomes available. To show the advantages of this least-commitment approach, we present a transformational planner that reasons about ARPlaces in order to optimize symbolic plans. Our empirical evaluation demonstrates that using ARPlaces leads to more robust and efficient mobile manipulation in the face of state estimation uncertainty on our simulated robot

    Thompson Sampling for Dynamic Pricing

    Full text link
    In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various parameters of the pricing problem, and uses the updated parameters to generate a new set of prices. We show that one can use active learning algorithms such as Thompson sampling to more efficiently learn the underlying parameters in a pricing problem. We apply our algorithms to a real e-commerce system and show that the algorithms indeed improve revenue compared to pricing algorithms that use passive learning

    Accelerated Magnetic Resonance Thermometry in Presence of Uncertainties

    Full text link
    An accelerated model-based information theoretic approach is presented to perform the task of Magnetic Resonance (MR) thermal image reconstruction from a limited number of observed samples on k-space. The key idea of the proposed approach is to utilize information theoretic techniques to optimally detect samples of k-space that are information rich with respect to a model of the thermal data acquisition. These highly informative k-space samples are then used to refine the mathematical model and reconstruct the image. The information theoretic reconstruction is demonstrated retrospectively in data acquired during MR guided Laser Induced Thermal Therapy (MRgLITT) procedures. The approach demonstrates that locations of high-information content with respect to a model based reconstruction of MR thermometry may be quantitatively identified. The predicted locations of high-information content are sorted and retrospectively extracted from the fully sampled k-space measurements data set. The effect of interactively increasing the predicted number of data points used in the subsampled reconstruction is quantified using the L2-norm of the distance between the subsampled and fully sampled reconstruction. Performance of the proposed approach is also compared with clinically available subsampling techniques (rectilinear subsampling and variable-density Poisson disk undersampling). It is shown that the proposed subsampling scheme results in accurate reconstructions using small fraction of k-space points and suggest that the reconstruction technique may be useful in improving the efficiency of the thermometry data temporal resolution.Comment: 29 pages, 25 figure

    Stochastic Multi-objective Optimization on a Budget: Application to multi-pass wire drawing with quantified uncertainties

    Full text link
    Design optimization of engineering systems with multiple competing objectives is a painstakingly tedious process especially when the objective functions are expensive-to-evaluate computer codes with parametric uncertainties. The effectiveness of the state-of-the-art techniques is greatly diminished because they require a large number of objective evaluations, which makes them impractical for problems of the above kind. Bayesian global optimization (BGO), has managed to deal with these challenges in solving single-objective optimization problems and has recently been extended to multi-objective optimization (MOO). BGO models the objectives via probabilistic surrogates and uses the epistemic uncertainty to define an information acquisition function (IAF) that quantifies the merit of evaluating the objective at new designs. This iterative data acquisition process continues until a stopping criterion is met. The most commonly used IAF for MOO is the expected improvement over the dominated hypervolume (EIHV) which in its original form is unable to deal with parametric uncertainties or measurement noise. In this work, we provide a systematic reformulation of EIHV to deal with stochastic MOO problems. The primary contribution of this paper lies in being able to filter out the noise and reformulate the EIHV without having to observe or estimate the stochastic parameters. An addendum of the probabilistic nature of our methodology is that it enables us to characterize our confidence about the predicted Pareto front. We verify and validate the proposed methodology by applying it to synthetic test problems with known solutions. We demonstrate our approach on an industrial problem of die pass design for a steel wire drawing process.Comment: 19 pages, 14 figure

    Optimization under Uncertainty in the Era of Big Data and Deep Learning: When Machine Learning Meets Mathematical Programming

    Full text link
    This paper reviews recent advances in the field of optimization under uncertainty via a modern data lens, highlights key research challenges and promise of data-driven optimization that organically integrates machine learning and mathematical programming for decision-making under uncertainty, and identifies potential research opportunities. A brief review of classical mathematical programming techniques for hedging against uncertainty is first presented, along with their wide spectrum of applications in Process Systems Engineering. A comprehensive review and classification of the relevant publications on data-driven distributionally robust optimization, data-driven chance constrained program, data-driven robust optimization, and data-driven scenario-based optimization is then presented. This paper also identifies fertile avenues for future research that focuses on a closed-loop data-driven optimization framework, which allows the feedback from mathematical programming to machine learning, as well as scenario-based optimization leveraging the power of deep learning techniques. Perspectives on online learning-based data-driven multistage optimization with a learning-while-optimizing scheme is presented

    Locomotion Planning through a Hybrid Bayesian Trajectory Optimization

    Full text link
    Locomotion planning for legged systems requires reasoning about suitable contact schedules. The contact sequence and timings constitute a hybrid dynamical system and prescribe a subset of achievable motions. State-of-the-art approaches cast motion planning as an optimal control problem. In order to decrease computational complexity, one common strategy separates footstep planning from motion optimization and plans contacts using heuristics. In this paper, we propose to learn contact schedule selection from high-level task descriptors using Bayesian optimization. A bi-level optimization is defined in which a Gaussian process model predicts the performance of trajectories generated by a motion planning nonlinear program. The agent, therefore, retains the ability to reason about suitable contact schedules, while explicit computation of the corresponding gradients is avoided. We delineate the algorithm in its general form and provide results for planning single-legged hopping. Our method is capable of learning contact schedule transitions that align with human intuition. It performs competitively against a heuristic baseline in predicting task appropriate contact schedules.Comment: Accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA) 201
    corecore