1,840 research outputs found
Empowerment and State-dependent Noise : An Intrinsic Motivation for Avoiding Unpredictable Agents
Empowerment is a recently introduced intrinsic motivation algorithm based on the embodiment of an agent and the dynamics of the world the agent is situated in. Computed as the channel capacity from an agent’s actuators to an agent’s sensors, it offers a quantitative measure of how much an agent is in control of the world it can perceive. In this paper, we expand the approximation of empowerment as a Gaussian linear channel to compute empowerment based on the covariance matrix between actuators and sensors, incorporating state dependent noise. This allows for the first time the study of continuous systems with several agents. We found that if the behaviour of another agent cannot be predicted accurately, then interacting with that agent will decrease the empowerment of the original agent. This leads to behaviour realizing collision avoidance with other agents, purely from maximising an agent’s empowermentFinal Accepted Versio
Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multi-Player Multi-Armed Bandit Framework
This paper deals with the problem of efficient resource allocation in dynamic
infrastructureless wireless networks. Assuming a reactive interference-limited
scenario, each transmitter is allowed to select one frequency channel (from a
common pool) together with a power level at each transmission trial; hence, for
all transmitters, not only the fading gain, but also the number of interfering
transmissions and their transmit powers are varying over time. Due to the
absence of a central controller and time-varying network characteristics, it is
highly inefficient for transmitters to acquire global channel and network
knowledge. Therefore a reasonable assumption is that transmitters have no
knowledge of fading gains, interference, and network topology. Each
transmitting node selfishly aims at maximizing its average reward (or
minimizing its average cost), which is a function of the action of that
specific transmitter as well as those of all other transmitters. This scenario
is modeled as a multi-player multi-armed adversarial bandit game, in which
multiple players receive an a priori unknown reward with an arbitrarily
time-varying distribution by sequentially pulling an arm, selected from a known
and finite set of arms. Since players do not know the arm with the highest
average reward in advance, they attempt to minimize their so-called regret,
determined by the set of players' actions, while attempting to achieve
equilibrium in some sense. To this end, we design in this paper two joint power
level and channel selection strategies. We prove that the gap between the
average reward achieved by our approaches and that based on the best fixed
strategy converges to zero asymptotically. Moreover, the empirical joint
frequencies of the game converge to the set of correlated equilibria. We
further characterize this set for two special cases of our designed game
Proximal operators for multi-agent path planning
We address the problem of planning collision-free paths for multiple agents
using optimization methods known as proximal algorithms. Recently this approach
was explored in Bento et al. 2013, which demonstrated its ease of
parallelization and decentralization, the speed with which the algorithms
generate good quality solutions, and its ability to incorporate different
proximal operators, each ensuring that paths satisfy a desired property.
Unfortunately, the operators derived only apply to paths in 2D and require that
any intermediate waypoints we might want agents to follow be preassigned to
specific agents, limiting their range of applicability. In this paper we
resolve these limitations. We introduce new operators to deal with agents
moving in arbitrary dimensions that are faster to compute than their 2D
predecessors and we introduce landmarks, space-time positions that are
automatically assigned to the set of agents under different optimality
criteria. Finally, we report the performance of the new operators in several
numerical experiments.Comment: See movie at http://youtu.be/gRnsjd_ocx
Interpretable Goal-Based model for Vehicle Trajectory Prediction in Interactive Scenarios
The abilities to understand the social interaction behaviors between a
vehicle and its surroundings while predicting its trajectory in an urban
environment are critical for road safety in autonomous driving. Social
interactions are hard to explain because of their uncertainty. In recent years,
neural network-based methods have been widely used for trajectory prediction
and have been shown to outperform hand-crafted methods. However, these methods
suffer from their lack of interpretability. In order to overcome this
limitation, we combine the interpretability of a discrete choice model with the
high accuracy of a neural network-based model for the task of vehicle
trajectory prediction in an interactive environment. We implement and evaluate
our model using the INTERACTION dataset and demonstrate the effectiveness of
our proposed architecture to explain its predictions without compromising the
accuracy.Comment: arXiv admin note: text overlap with arXiv:2105.03136 by other author
HARL: A Novel Hierachical Adversary Reinforcement Learning for Automoumous Intersection Management
As an emerging technology, Connected Autonomous Vehicles (CAVs) are believed
to have the ability to move through intersections in a faster and safer manner,
through effective Vehicle-to-Everything (V2X) communication and global
observation. Autonomous intersection management is a key path to efficient
crossing at intersections, which reduces unnecessary slowdowns and stops
through adaptive decision process of each CAV, enabling fuller utilization of
the intersection space. Distributed reinforcement learning (DRL) offers a
flexible, end-to-end model for AIM, adapting for many intersection scenarios.
While DRL is prone to collisions as the actions of multiple sides in the
complicated interactions are sampled from a generic policy, restricting the
application of DRL in realistic scenario. To address this, we propose a
hierarchical RL framework where models at different levels vary in receptive
scope, action step length, and feedback period of reward. The upper layer model
accelerate CAVs to prevent them from being clashed, while the lower layer model
adjust the trends from upper layer model to avoid the change of mobile state
causing new conflicts. And the real action of CAV at each step is co-determined
by the trends from both levels, forming a real-time balance in the adversarial
process. The proposed model is proven effective in the experiment undertaken in
a complicated intersection with 4 branches and 4 lanes each branch, and show
better performance compared with baselines
- …