1,840 research outputs found

    Empowerment and State-dependent Noise : An Intrinsic Motivation for Avoiding Unpredictable Agents

    Get PDF
    Empowerment is a recently introduced intrinsic motivation algorithm based on the embodiment of an agent and the dynamics of the world the agent is situated in. Computed as the channel capacity from an agent’s actuators to an agent’s sensors, it offers a quantitative measure of how much an agent is in control of the world it can perceive. In this paper, we expand the approximation of empowerment as a Gaussian linear channel to compute empowerment based on the covariance matrix between actuators and sensors, incorporating state dependent noise. This allows for the first time the study of continuous systems with several agents. We found that if the behaviour of another agent cannot be predicted accurately, then interacting with that agent will decrease the empowerment of the original agent. This leads to behaviour realizing collision avoidance with other agents, purely from maximising an agent’s empowermentFinal Accepted Versio

    Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multi-Player Multi-Armed Bandit Framework

    Full text link
    This paper deals with the problem of efficient resource allocation in dynamic infrastructureless wireless networks. Assuming a reactive interference-limited scenario, each transmitter is allowed to select one frequency channel (from a common pool) together with a power level at each transmission trial; hence, for all transmitters, not only the fading gain, but also the number of interfering transmissions and their transmit powers are varying over time. Due to the absence of a central controller and time-varying network characteristics, it is highly inefficient for transmitters to acquire global channel and network knowledge. Therefore a reasonable assumption is that transmitters have no knowledge of fading gains, interference, and network topology. Each transmitting node selfishly aims at maximizing its average reward (or minimizing its average cost), which is a function of the action of that specific transmitter as well as those of all other transmitters. This scenario is modeled as a multi-player multi-armed adversarial bandit game, in which multiple players receive an a priori unknown reward with an arbitrarily time-varying distribution by sequentially pulling an arm, selected from a known and finite set of arms. Since players do not know the arm with the highest average reward in advance, they attempt to minimize their so-called regret, determined by the set of players' actions, while attempting to achieve equilibrium in some sense. To this end, we design in this paper two joint power level and channel selection strategies. We prove that the gap between the average reward achieved by our approaches and that based on the best fixed strategy converges to zero asymptotically. Moreover, the empirical joint frequencies of the game converge to the set of correlated equilibria. We further characterize this set for two special cases of our designed game

    Proximal operators for multi-agent path planning

    Full text link
    We address the problem of planning collision-free paths for multiple agents using optimization methods known as proximal algorithms. Recently this approach was explored in Bento et al. 2013, which demonstrated its ease of parallelization and decentralization, the speed with which the algorithms generate good quality solutions, and its ability to incorporate different proximal operators, each ensuring that paths satisfy a desired property. Unfortunately, the operators derived only apply to paths in 2D and require that any intermediate waypoints we might want agents to follow be preassigned to specific agents, limiting their range of applicability. In this paper we resolve these limitations. We introduce new operators to deal with agents moving in arbitrary dimensions that are faster to compute than their 2D predecessors and we introduce landmarks, space-time positions that are automatically assigned to the set of agents under different optimality criteria. Finally, we report the performance of the new operators in several numerical experiments.Comment: See movie at http://youtu.be/gRnsjd_ocx

    Interpretable Goal-Based model for Vehicle Trajectory Prediction in Interactive Scenarios

    Full text link
    The abilities to understand the social interaction behaviors between a vehicle and its surroundings while predicting its trajectory in an urban environment are critical for road safety in autonomous driving. Social interactions are hard to explain because of their uncertainty. In recent years, neural network-based methods have been widely used for trajectory prediction and have been shown to outperform hand-crafted methods. However, these methods suffer from their lack of interpretability. In order to overcome this limitation, we combine the interpretability of a discrete choice model with the high accuracy of a neural network-based model for the task of vehicle trajectory prediction in an interactive environment. We implement and evaluate our model using the INTERACTION dataset and demonstrate the effectiveness of our proposed architecture to explain its predictions without compromising the accuracy.Comment: arXiv admin note: text overlap with arXiv:2105.03136 by other author

    HARL: A Novel Hierachical Adversary Reinforcement Learning for Automoumous Intersection Management

    Full text link
    As an emerging technology, Connected Autonomous Vehicles (CAVs) are believed to have the ability to move through intersections in a faster and safer manner, through effective Vehicle-to-Everything (V2X) communication and global observation. Autonomous intersection management is a key path to efficient crossing at intersections, which reduces unnecessary slowdowns and stops through adaptive decision process of each CAV, enabling fuller utilization of the intersection space. Distributed reinforcement learning (DRL) offers a flexible, end-to-end model for AIM, adapting for many intersection scenarios. While DRL is prone to collisions as the actions of multiple sides in the complicated interactions are sampled from a generic policy, restricting the application of DRL in realistic scenario. To address this, we propose a hierarchical RL framework where models at different levels vary in receptive scope, action step length, and feedback period of reward. The upper layer model accelerate CAVs to prevent them from being clashed, while the lower layer model adjust the trends from upper layer model to avoid the change of mobile state causing new conflicts. And the real action of CAV at each step is co-determined by the trends from both levels, forming a real-time balance in the adversarial process. The proposed model is proven effective in the experiment undertaken in a complicated intersection with 4 branches and 4 lanes each branch, and show better performance compared with baselines
    • …
    corecore