147 research outputs found

    A Real-Time Game Theoretic Planner for Autonomous Two-Player Drone Racing

    Full text link
    To be successful in multi-player drone racing, a player must not only follow the race track in an optimal way, but also compete with other drones through strategic blocking, faking, and opportunistic passing while avoiding collisions. Since unveiling one's own strategy to the adversaries is not desirable, this requires each player to independently predict the other players' future actions. Nash equilibria are a powerful tool to model this and similar multi-agent coordination problems in which the absence of communication impedes full coordination between the agents. In this paper, we propose a novel receding horizon planning algorithm that, exploiting sensitivity analysis within an iterated best response computational scheme, can approximate Nash equilibria in real time. We also describe a vision-based pipeline that allows each player to estimate its opponent's relative position. We demonstrate that our solution effectively competes against alternative strategies in a large number of drone racing simulations. Hardware experiments with onboard vision sensing prove the practicality of our strategy

    Benchmarking von Verhaltens-Algorithmen für die Applikation in Motorsport-Szenarien

    Get PDF
    Im Zuge der Entwicklung des autonomen Fahrens existieren Bemühungen, den Motorsport als Technologie-Treiber und Showcase in diese Entwicklung mit einzubinden. Die Herausforderungen, die der Motorsport für die autonome Fahrt präsentiert, sind vielfältig und beinhalten neben der Wahrnehmung der Umwelt bei hohen Geschwindigkeiten auch die Regelung des Fahrzeugs und die Trajektorienplanung. Während die Trajektorienplanung für ein einzelnes Fahrzeug eine von vielen Autoren untersuchte Aufgabe ist, existieren für die Verhaltensplanung der Interaktion von mehreren Fahrzeugen bislang nur wenige Ansätze. Diese Ansätze basieren häufig auf unterschiedlichen Prinzipien und Fahrzeugmodellen. Somit ist die Evaluation häufig zwar nachvollziehbar, aber nicht auf andere Methoden extrapolierbar. Diese Arbeit startet daher mit der initialen Frage, wie und unter welchen Bedingungen möglich ist, verschiedene Algorithmen, die Verhalten für autonome Rennfahrzeugen erzeugen, miteinander zu vergleichen. Die Analyse des Stands der Forschung offenbart, dass bisherige Ansätze für den Vergleich bislang häufig auf Reward-Funktionen aus dem Bereich des maschinellen Lernens zurückgreifen oder Metriken für die Evaluation nutzen, deren Relevanz für das betrachtete Szenario nicht genauer untersucht wird. Bestehende Ranking-Verfahren bieten die Möglichkeit, große Zahlen an menschlichen Spielern in eine Rangfolge zu bringen. Für Computeralgorithmen existieren dagegen aufgrund von Überlegenheiten nach dem Schere-Stein-Papier-Prinzip unterschiedlicher Strategien nur Methoden für kleine Anzahlen an Spielern. Auf Basis der Analyse von Benchmarkings und den im Motorsport verfolgten Zielen werden die Randbedingungen für ein Szenario anhand der Szenario-Definition von Ulbricht et al. mit Bezug auf eine qualitative Analyse und eine quantitative Einordnung diskutiert. Danach wird eine Methode vorgestellt, die genutzt werden kann, um auch eine Vielzahl an Algorithmen in einer Rangfolge anzuordnen. Eine vorgeschlagene Methode, die versucht, einzelne Testergebnisse zu prognostizieren und die Ergebnisse des Rankings in einem allgemein definierten Szenario offenbaren die Schwachstellen der untersuchten Verhaltensalgorithmen. Die Erstellung einer Rangfolge ist danach nur unter der Bedingung möglich, dass ein konkretes Test-Szenario definiert ist. Ein allgemeingültiger direkter Vergleich ist, aufgrund mangelnder Vergleichbarkeit für breite Parametervariationen, nicht möglich. Diese Arbeit präsentiert ein umfassendes Konzept, um Verhaltensalgorithmen zu analysieren und weiterzuentwickeln. Das vorgestellte Ranking-Verfahren erlaubt außerdem die Bildung einer Rangfolge bei Voraussetzung eines im Vorfeld definierten Testszenarios. Für die Evaluierung des Bewertungsverfahrens wird ein Trajektorienplaner entwickelt und umgesetzt. Bedingt durch die kurzen Berechnungszeiten wird die Kombination des modellprädiktiven Planers mit einem neuronalen Netz ermöglicht. Auf diese Weise eröffnet sich das Feld der Verhaltensoptimierung durch maschinelles Lernen, das durch besonders hohe Anzahl zu berechnender Zeitschritte mit anderen Ansätzen der Trajektorienberechnung keine praktikable Umsetzung bietet. Auf diese Weise wird ein Agent implementiert, der die Aufgabe erfüllt, ein anderes Fahrzeug am Überholen zu hindern und auch ein Überholen zu ermöglichen. Der vorgestellte deterministische Ansatz für das Problem aggressive Überholmanöver auszuführen, zeigt in der ersten Analyse eine vielversprechende Performance. Allerdings treten für den Ansatz bei der Fahrt auf realen Streckenlayouts noch eine hohe Anzahl an Kollisionen auf

    Validation of trajectory planning strategies for automated driving under cooperative, urban, and interurban scenarios.

    Get PDF
    149 p.En esta Tesis se estudia, diseña e implementa una arquitectura de control para vehículos automatizados de forma dual, que permite realizar pruebas en simulación y en vehículos reales con los mínimos cambios posibles. La arquitectura descansa sobre seis módulos: adquisición de información de sensores, percepción del entorno, comunicaciones e interacción con otros agentes, decisión de maniobras, control y actuación, además de la generación de mapas en el módulo de decisión, que utiliza puntos simples para la descripción de las estructuras de la ruta (rotondas, intersecciones, tramos rectos y cambios de carril)Tecnali

    Building trust in autonomous vehicles: Role of virtual reality driving simulators in HMI design

    Get PDF
    The investigation of factors contributing at making humans trust Autonomous Vehicles (AVs) will play a fundamental role in the adoption of such technology. The user's ability to form a mental model of the AV, which is crucial to establish trust, depends on effective user-vehicle communication; thus, the importance of Human-Machine Interaction (HMI) is poised to increase. In this work, we propose a methodology to validate the user experience in AVs based on continuous, objective information gathered from physiological signals, while the user is immersed in a Virtual Reality-based driving simulation. We applied this methodology to the design of a head-up display interface delivering visual cues about the vehicle' sensory and planning systems. Through this approach, we obtained qualitative and quantitative evidence that a complete picture of the vehicle's surrounding, despite the higher cognitive load, is conducive to a less stressful experience. Moreover, after having been exposed to a more informative interface, users involved in the study were also more willing to test a real AV. The proposed methodology could be extended by adjusting the simulation environment, the HMI and/or the vehicle's Artificial Intelligence modules to dig into other aspects of the user experience

    Context dependent fuzzy modelling and its applications

    Get PDF
    Fuzzy rule-based systems (FRBS) use the principle of fuzzy sets and fuzzy logic to describe vague and imprecise statements and provide a facility to express the behaviours of the system with a human-understandable language. Fuzzy information, once defined by a fuzzy system, is fixed regardless of the circumstances and therefore makes it very difficult to capture the effect of context on the meaning of the fuzzy terms. While efforts have been made to integrate contextual information into the representation of fuzzy sets, it remains the case that often the context model is very restrictive and/or problem specific. The work reported in this thesis is our attempt to create a practical frame work to integrate contextual information into the representation of fuzzy sets so as to improve the interpretability as well as the accuracy of the fuzzy system. Throughout this thesis, we have looked at the capability of the proposed context dependent fuzzy sets as a stand alone as well as in combination with other methods in various application scenarios ranging from time series forecasting to complicated car racing control systems. In all of the applications, the highly competitive performance nature of our approach has proven its effectiveness and efficiency compared with existing techniques in the literature

    Application of reinforcement learning methods to computer game dynamics

    Get PDF
    The dynamics of the game world present both challenges and opportunities for AI to make a useful difference. Learning smart behaviours for game assets is a first step towards realistic conflict or cooperation. The scope this thesis is the application of Reinforcement Learning to moving assets in the game world. Game sessions a generate stream data on asset's performance which must be processed on the fly. The lead objective is to produce fast, lightweight and flexible learning algorithms for run-time embedding. The motivation from current work is to shorten the time to achieve a workable policy solution by investigating the exploration / exploitation balance, overcome the curse of dimensionality of complex systems, and avoid the use of extra endogenous parameters which require multiple data passes and use a simple state aggregation rather than functional approximation. How action selection (AS) contributes to efficient learning is a key issue in RL since is determines the balance between exploiting and confirming the current policy or exploring an early less likely policy which may prove better in the long run. The methodology deploys the simulation of several AS using 10-armed bandit problem averaged over 10000 epochs. The results show a considerable variation in performance in terms of latency and asymptotic direction. The Upper Confidence Bound comes out leader over most of the episode range, especially at about 100. Using insight from action selection order statistics are applied to determine a criterion for the convergence of policy evaluation. The probability that the action of maximum sample mean is indeed the action of maximum population mean (PMSMMPM) is calculated using the 3 armed bandit problem. PMSMMPM reaches 0.988 by play 26 which provides evidence for it as a convergence criterion. An iteration stopping rule is defined using PMSMMPM and it shows plausible properties as the population parameters are varied. A mathematical analysis of the approximation (P21) of just taking the top two actions yields a minimum sampling size for any level of P21. Using the gradient of P21 a selection rule is derived and when combined with UCB a new complete exploratory policy is demonstrated for 3-arm bandit that requires just over half the sample size when compared with pure UCB. The results provide evidence that the augmented UCB selection rule will contribute to faster learning. TD sarsa(0) learning algorithm has been applied to learn a steering policy for the untried caravan reversing problem and for the kerb avoiding steering problem of racing car both using negative rewards on failure and a simple aggregation. The output policy for the caravan is validated as non jack-knifing for a high proportion of start states. The racing car policy has a similar validation outcome for two exploratory polies which are compared and contrasted

    Human-vehicle collaborative driving to improve transportation safety

    Get PDF
    This dissertation proposes a collaborative driving framework which is based on the assessments of both internal and external risks involved in vehicle driving. The internal risk analysis includes driver drowsiness detection, driver distraction detection, and driver intention recognition which help us better understand the human driver's behavior. Steering wheel data and facial expression are used to detect the drowsiness. Images from a camera observing the driver are used to detect various types of driver distraction by using the deep learning approach. Hidden Markov Models (HMM) is implemented to recognize the driver's intention using the vehicle's laneposition, control and state data. For the external risk analysis, the co-pilot utilizes a Collision Avoidance System (CAS) to estimate the collision probability between the ego vehicle and other vehicles. Based on these two risk analyses, a novel collaborative driving scheme is proposed by fusing the control inputs from the human driver and the co-pilot to obtain the final control input for the vehicle under different circumstances. The proposed collaborative driving framework is validated in an Intelligent Transportation System (ITS) testbed which enables both autonomous and manual driving capabilities
    corecore