1,035 research outputs found

    Probabilistic Traversability Model for Risk-Aware Motion Planning in Off-Road Environments

    Full text link
    A key challenge in off-road navigation is that even visually similar terrains or ones from the same semantic class may have substantially different traction properties. Existing work typically assumes no wheel slip or uses the expected traction for motion planning, where the predicted trajectories provide a poor indication of the actual performance if the terrain traction has high uncertainty. In contrast, this work proposes to analyze terrain traversability with the empirical distribution of traction parameters in unicycle dynamics, which can be learned by a neural network in a self-supervised fashion. The probabilistic traction model leads to two risk-aware cost formulations that account for the worst-case expected cost and traction. To help the learned model generalize to unseen environment, terrains with features that lead to unreliable predictions are detected via a density estimator fit to the trained network's latent space and avoided via auxiliary penalties during planning. Simulation results demonstrate that the proposed approach outperforms existing work that assumes no slip or uses the expected traction in both navigation success rate and completion time. Furthermore, avoiding terrains with low density-based confidence score achieves up to 30% improvement in success rate when the learned traction model is used in a novel environment.Comment: To appear in IROS23. Video and code: https://github.com/mit-acl/mppi_numb

    Dynamic vehicle routing problems: Three decades and counting

    Get PDF
    Since the late 70s, much research activity has taken place on the class of dynamic vehicle routing problems (DVRP), with the time period after year 2000 witnessing a real explosion in related papers. Our paper sheds more light into work in this area over more than 3 decades by developing a taxonomy of DVRP papers according to 11 criteria. These are (1) type of problem, (2) logistical context, (3) transportation mode, (4) objective function, (5) fleet size, (6) time constraints, (7) vehicle capacity constraints, (8) the ability to reject customers, (9) the nature of the dynamic element, (10) the nature of the stochasticity (if any), and (11) the solution method. We comment on technological vis-à-vis methodological advances for this class of problems and suggest directions for further research. The latter include alternative objective functions, vehicle speed as decision variable, more explicit linkages of methodology to technological advances and analysis of worst case or average case performance of heuristics.© 2015 Wiley Periodicals, Inc

    The effect of simulation bias on action selection in Monte Carlo Tree Search

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfilment of the requirements for the degree of Master of Science. August 2016.Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread attention in recent years. It combines a traditional tree-search approach with Monte Carlo simulations, using the outcome of these simulations (also known as playouts or rollouts) to evaluate states in a look-ahead tree. That MCTS does not require an evaluation function makes it particularly well-suited to the game of Go — seen by many to be chess’s successor as a grand challenge of artificial intelligence — with MCTS-based agents recently able to achieve expert-level play on 19×19 boards. Furthermore, its domain-independent nature also makes it a focus in a variety of other fields, such as Bayesian reinforcement learning and general game-playing. Despite the vast amount of research into MCTS, the dynamics of the algorithm are still not yet fully understood. In particular, the effect of using knowledge-heavy or biased simulations in MCTS still remains unknown, with interesting results indicating that better-informed rollouts do not necessarily result in stronger agents. This research provides support for the notion that MCTS is well-suited to a class of domain possessing a smoothness property. In these domains, biased rollouts are more likely to produce strong agents. Conversely, any error due to incorrect bias is compounded in non-smooth domains, and in particular for low-variance simulations. This is demonstrated empirically in a number of single-agent domains.LG201

    Distributed Online Rollout for Multivehicle Routing in Unmapped Environments

    Full text link
    In this work we consider a generalization of the well-known multivehicle routing problem: given a network, a set of agents occupying a subset of its nodes, and a set of tasks, we seek a minimum cost sequence of movements subject to the constraint that each task is visited by some agent at least once. The classical version of this problem assumes a central computational server that observes the entire state of the system perfectly and directs individual agents according to a centralized control scheme. In contrast, we assume that there is no centralized server and that each agent is an individual processor with no a priori knowledge of the underlying network (including task and agent locations). Moreover, our agents possess strictly local communication and sensing capabilities (restricted to a fixed radius around their respective locations), aligning more closely with several real-world multiagent applications. These restrictions introduce many challenges that are overcome through local information sharing and direct coordination between agents. We present a fully distributed, online, and scalable reinforcement learning algorithm for this problem whereby agents self-organize into local clusters and independently apply a multiagent rollout scheme locally to each cluster. We demonstrate empirically via extensive simulations that there exists a critical sensing radius beyond which the distributed rollout algorithm begins to improve over a greedy base policy. This critical sensing radius grows proportionally to the log⁡∗\log^* function of the size of the network, and is, therefore, a small constant for any relevant network. Our decentralized reinforcement learning algorithm achieves approximately a factor of two cost improvement over the base policy for a range of radii bounded from below and above by two and three times the critical sensing radius, respectively

    Performance improvements of a highly integrated digital electronic control system for an F-15 airplane

    Get PDF
    The NASA highly integrated digital electronic control (HIDEC) program is structured to conduct flight research into the benefits of integrating an aircraft flight control system with the engine control system. A brief description of the HIDEC system installed on an F-15 aircraft is provided. The adaptive engine control system (ADECS) mode is described in detail, together with simulation results and analyses that show the significant excess thrust improvements achievable with the ADECS mode. It was found that this increased thrust capability is accompanied by reduced fan stall margin and can be realized during flight conditions where engine face distortion is low. The results of analyses and simulations also show that engine thrust response is improved and that fuel consumption can be reduced. Although the performance benefits that accrue because of airframe and engine control integration are being demonstrated on an F-15 aircraft, the principles are applicable to advanced aircraft such as the advanced tactical fighter and advanced tactical aircraft

    ℓ-CTP: Utilizing Multiple Agents to Find Efficient Routes in Disrupted Networks

    Get PDF
    Recent hurricane seasons have demonstrated the need for more effective methods of coping with flooding of roadways. A key complaint of logistics managers is the lack of knowledge when developing routes for vehicles attempting to navigate through areas which may be flooded. In particular, it can be difficult to re-route large vehicles upon encountering a flooded roadway. We utilize the Canadian Traveller’s Problem (CTP) to construct an online framework for utilizing multiple vehicles to discover low-cost paths through networks with failed edges unknown to one or more agents a priori. This thesis demonstrates the following results: first, we develop the ℓ-CTP framework to extend a theoretically validated set of path planning policies for a single agent in combination with the iterative penalty method, which incentivizes a group of ℓ \u3e 1 agents to explore dissimilar paths on a graph between a common origin and destination. Second, we carry out simulations on random graphs to determine the impact of the addition of agents on the path cost found. Through statistical analysis of graphs of multiple sizes, we validate our technique against prior work and demonstrate that path cost can be modeled as an exponential decay function on the number of agents. Finally, we demonstrate that our approach can scale to large graphs, and the results found on random graphs hold for a simulation of the Houston metro area during hurricane Harvey

    Applying machine learning techniques to an imperfect information game

    Get PDF
    The game of poker presents a challenging game to Artificial Intelligence researchers because it is a complex asymmetric information game. In such games, a player can improve his performance by inferring the private information held by the other players from their prior actions. A novel connectionist structure was designed to play a version of poker (multi-player limit Hold‟em). This allows simple reinforcement learning techniques to be used which previously not been considered for the game of multi-player hold‟em. A related hidden Markov model was designed to be fitted to records of poker play without using any private information. Belief vectors generated by this model provide a more convenient and flexible representation of an opponent‟s action history than alternative approaches. The structure was tested in two settings. Firstly self-play simulation was used to generate an approximation to a Nash equilibrium strategy. A related, but slower, rollout strategy that uses Monte-Carlo samples was used to evaluate the performance. Secondly the structure was used to model and hence exploit a population of opponents within a relatively small number of games. When and how to adapt quickly to new opponents are open questions in poker AI research. A opponent model with a small number of discrete types is used to identify the largest differences in strategy between members of the population. A commercial software package (Poker Academy) was used to provide a population of sophisticated opponents to test against. A series of experiments was conducted to compare adaptive and static systems. All systems showed positive results but surprisingly the adaptive systems did not show a significant improvement over similar static systems. The possible reasons for this result are discussed. This work formed the basis of a series of entries to the computer poker competition hosted at the annual conferences of the Association for the Advancement of Artificial Intelligence (AAAI). Its best rankings were 3rd in the 2006 6-player limit hold‟em competition and 2nd in the 2008 3-player limit hold‟em competition
    • 

    corecore