9 research outputs found

    PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

    Full text link
    We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling based path planning with reinforcement learning (RL). The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology. Next, the sampling-based planners provide roadmaps which connect robot configurations that can be successfully navigated by the RL agent. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. Our results show improvement in task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 m long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 m without violating the task constraints in an environment 63 million times larger than used in training.Comment: 9 pages, 7 figure

    High-Dimensional Motion Planning and Learning Under Uncertain Conditions

    Get PDF
    Many existing path planning methods do not adequately account for uncertainty. Without uncertainty these existing techniques work well, but in real world environments they struggle due to inaccurate sensor models, arbitrarily moving obstacles, and uncertain action consequences. For example, picking up and storing childrens toys is a simple task for humans. Yet, for a robotic household robot the task can be daunting. The room must be modeled with sensors, which may or may not detect all the strewn toys. The robot must be able to detect and avoid the child who may be moving the very toys that the robot is tasked with cleaning. Finally, if the robot missteps and places a foot on a toy, it must be able to compensate for the unexpected consequences of its actions. This example demonstrates that even simple human tasks are fraught with uncertainties that must be accounted for in robotic path planning algorithms. This work presents the first steps towards migrating sampling-based path planning methods to real world environments by addressing three different types of uncertainty: (1) model uncertainty, (2) spatio-temporal obstacle uncertainty (moving obstacles) and (3) action consequence uncertainty. Uncertainty is encoded directly into path planning through a data structure in order to successfully and efficiently identify safe robot paths in sensed environments with noise. This encoding produces comparable clearance paths to other planning methods which are a known for high clearance, but at an order of magnitude less computational cost. It also shows that formal control theory methods combined with path planning provides a technique that has a 95% collision-free navigation rate with 300 moving obstacles. Finally, it demonstrates that reinforcement learning can be combined with planning data structures to autonomously learn motion controls of a seven degree of freedom robot despite a low computational cost despite the number of dimensions

    Reinforcement Learning and Planning for Preference Balancing Tasks

    Get PDF
    Robots are often highly non-linear dynamical systems with many degrees of freedom, making solving motion problems computationally challenging. One solution has been reinforcement learning (RL), which learns through experimentation to automatically perform the near-optimal motions that complete a task. However, high-dimensional problems and task formulation often prove challenging for RL. We address these problems with PrEference Appraisal Reinforcement Learning (PEARL), which solves Preference Balancing Tasks (PBTs). PBTs define a problem as a set of preferences that the system must balance to achieve a goal. The method is appropriate for acceleration-controlled systems with continuous state-space and either discrete or continuous action spaces with unknown system dynamics. We show that PEARL learns a sub-optimal policy on a subset of states and actions, and transfers the policy to the expanded domain to produce a more refined plan on a class of robotic problems. We establish convergence to task goal conditions, and even when preconditions are not verifiable, show that this is a valuable method to use before other more expensive approaches. Evaluation is done on several robotic problems, such as Aerial Cargo Delivery, Multi-Agent Pursuit, Rendezvous, and Inverted Flying Pendulum both in simulation and experimentally. Additionally, PEARL is leveraged outside of robotics as an array sorting agent. The results demonstrate high accuracy and fast learning times on a large set of practical applications

    Real-Time Path Planning for Automating Optical Tweezers based Particle Transport Operations

    Get PDF
    Optical tweezers (OT) have been developed to successfully trap, orient, and transport micro and nano scale components of many different sizes and shapes in a fluid medium. They can be viewed as robots made out of light. Components can be simply released from optical traps by switching off laser beams. By utilizing the principle of time sharing or holograms, multiple optical traps can perform several operations in parallel. These characteristics make optical tweezers a very promising technology for creating directed micro and nano scale assemblies. In the infra-red regime, they are useful in a large number of biological applications as well. This dissertation explores the problem of real-time path planning for autonomous OT based transport operations. Such operations pose interesting challenges as the environment is uncertain and dynamic due to the random Brownian motion of the particles and noise in the imaging based measurements. Silica microspheres having diameters between (1-20) µm are selected as model components. Offline simulations are performed to gather trapping probability data that serves as a measure of trap strength and reliability as a function of relative position of the particle under consideration with respect to the trap focus, and trap velocity. Simplified models are generated using Gaussian Radial Basis Functions to represent the data in a compact form. These metamodels can be queried at run-time to obtain estimated probability values accurately and efficiently. Simple trapping probability models are then utilized in a stochastic dynamic programming framework to compute optimum trap locations and velocities that minimizes the total, expected transport time by incorporating collision avoidance and recovery steps. A discrete version of an approximate partially observable Markov decision process algorithm, called the QMDP_NLTDV algorithm, is developed. Real-time performance is ensured by pruning the search space and enhancing convergence rates by introducing a non-linear value function. The algorithm is validated both using a simulator as well as a physical holographic tweezer set-up. Successful runs show that the automated planner is flexible, works well in reasonably crowded scenes, and is capable of transporting a specific particle to a given goal location by avoiding collisions either by circumventing or by trapping other freely diffusing particles. This technique for transporting individual particles is utilized within a decoupled and prioritized approach to move multiple particles simultaneously. An iterative version of a bipartite graph matching algorithm is also used to assign goal locations to target objects optimally. As in the case of single particle transport, simulation and some physical experiments are performed to validate the multi-particle planning approach

    A Computational Model for Simulation, Visualization and Evaluation of Mandatory and Optional Building Occupancy Scenarios

    Get PDF
    Evaluating design decisions is an important factor in a post-positivist design process. Understanding how people move in space is an important part of the evaluation processes. However, making accurate predictions of occupants’ movements is a challenge mainly due to the differences between individual occupants, their unique preferences in relation to environmental qualities, the types of scenarios with which they become engaged, and multiple dimensions of the environmental factors that affect occupants’ decisions. This study suggests a model to simulate and visualize mandatory occupancy scenarios, which are task-based, and optional occupancy scenarios, which are attraction-based. The impact of environmental qualities is largely overlooked in existing simulation models in both of these scenarios. Existing simulation models for mandatory scenarios are often based on finding shortest or fastest paths and for optional scenarios mainly rely on the field of visibility. The original contribution of the simulation models that this study suggests is simultaneous consideration of environmental qualities, path simplicity, and visibility in addition to desires such as travel time or distance minimization. The integration of these models unlocks new potentials that the individual components do not include. The individual techniques that will be used to develop the occupancy simulation models are validated in the exiting literature experimentally. However, this study does not include conducting field studies to validate the integrated model. If the observed walking trails of humans are provided, the suggested models in this study can be validated through a fine-tuning process that reproduces the observed trails. The simulation results can finally be used for evaluation purposes to help designers at the design phase and facility managers in after design phases to make informed decisions. This study provides a software solution that implements the suggested model to support its feasibility. This software uses a Building Information Model (BIM) to represent the built environment, an Agent-Based Model (ABM) to simulate the occupants, a list of research evidence to encode agent’s reactions to the environment, a Discrete Event Simulation (DES) model to represent the tasks in mandatory scenarios, and the field of visibility (isovist) to simulate an occupant’s viewshed. In this software, evaluation is a process of data query from the information collected by the agents during the simulations. The data query logic can be set according to the interests of designers or facility managers

    A Framework for Planning Motion in Environments with Moving Obstacles

    No full text
    In this paper we present a heuristic approach to planning in an environment with moving obstacles. Our approach assumes that the robot has no knowledge of the future trajectory of the moving objects. Our framework also distinguishes between two types of moving objects in the environment: hard and soft objects. We distinguish between the two types of objects in the environment as varying application domains could allow for some collision between some types of moving objects. For example, a robot planning a path in an environment with people could have the people modeled as circular disks with a safe zone surrounding each person. Although the robot may try to stay out of each safe zone, violating that criteria would not necessarily result in planning failure. We will show the effectiveness of our planner in general dynamic environments with Planning a path for a robot has been widely studied. There has been quite a lot of work on planning a path for a holonomic, free-flying, robot [1–3], and planning a path for a nonholonomic robot with constraints on the movement of the robot [4–6]. One problem that has been less studied is planning a path for a robot with constraints in a realistic environment. This includes environments that can change dynamically and that includ

    Tree Paths: A New Model For Steering Behaviors

    No full text
    This paper describes a model for generating steering behaviors of groups of characters based on the biologically-motivated space colonization algorithm. This algorithm has been used in the past for generating leaf venation patterns and tree structures, simulating the competition for space between growing veins or branches. Adapted to character animation, this model is responsible for the motion control of characters providing robust and realistic group behaviors by adjusting just a few parameters. The main contributions are related with the robustness, flexibility and simplicity to control groups of characters. © 2009 Springer Berlin Heidelberg.5773 LNAI358371Reynolds, C.W., Flocks, herds and schools: A distributed behavioral model (1987) Proceedings of SIGGRAPH, pp. 25-34. , NY, USA, ppSachs, T., Polarity and the induction of organized vascular tissues (1969) Annals of Botany, 33 (2), pp. 263-275Tu, X., Terzopoulos, D., Artificial fishes: Physics, locomotion, perception, behavior (1994) Proceedings of SIGGRAPH, pp. 43-50. , NY, USA, ppLaValle, S., Rapidly-exploring random trees: A new tool for path planning (1998), Technical Report TR98-11, Dep. of Computer Science, Iowa State UniversityChoi, M.G., Lee, J., Shin, S.Y., Planning biped locomotion using motion capture data and probabilistic roadmaps (2003) ACM Trans. Graph, 22 (2), pp. 182-203Metoyer, R.A., Hodgins, J.K., Reactive pedestrian path following from examples (2004) The Visual Computer, 20 (10), pp. 635-649Dapper, F., Prestes, E., Nedel, L.P., Generating steering behaviors for virtual humanoids using bvp control (2007) Proc. of Computer Graphics International, pp. 105-114. , RJ, Brazil, ppRodríguez, S., Lien, J.M., Amato, N.M.: A framework for planning motion in environments with moving obstacles. In: IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems, November 2007, pp. 3309-3314 (2007)Kamphuis, A., Overmars, M.H., Finding paths for coherent groups using clearance (2004) Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 19-28. , Eurographics Association, SwitzerlandRodríguez, S., Salazar, R., McMahon, T., Amato, N.M., Roadmap-based group behaviors: Generation and evaluation (2007), Technical Report TR07-004, Dep. of Computer Science, Texas A&M UniversityLien, J.M., Rodríguez, S., Malric, J.P., Amato, N.M., Shepherding behaviors with multiple shepherds (2005) Proceedings of the IEEE Inter. Conf. on Robotics and Automation, pp. 3402-3407Musse, S.R., Jung, C.R., Jacques Jr., J.C.S., Using computer vision to simulate the motion of virtual agents (2007) Computer Animation and Virtual Worlds, 18 (2), pp. 83-93de Lima Bicho, A.: From Plants to Crowd Dynamics: A bio-inspired model (in portuguese, to be published). PhD thesis, State University of Campinas, Campinas, Brazil (July 2009)Runions, A., Fuhrer, M., Lane, B., Federl, P., Rolland-Lagan, A.-G., Prusinkiewicz, P., Modeling and visualization of leaf venation patterns (2005) ACM Trans. Graph, 24 (3), pp. 702-711Runions, A., Lane, B., Prusinkiewicz, P., Modeling trees with a space colonization algorithm (2007) Proc. of the Euro.Workshop on Natural Phenomena, pp. 63-70. , Prague, Czech Republic, SeptemberTreuille, A., Cooper, S., Popović, Z., Continuum crowds. (2006) ACM Trans. Graph, 25 (3), pp. 1160-116
    corecore