88 research outputs found
How Fast Can We Play Tetris Greedily With Rectangular Pieces?
Consider a variant of Tetris played on a board of width and infinite
height, where the pieces are axis-aligned rectangles of arbitrary integer
dimensions, the pieces can only be moved before letting them drop, and a row
does not disappear once it is full. Suppose we want to follow a greedy
strategy: let each rectangle fall where it will end up the lowest given the
current state of the board. To do so, we want a data structure which can always
suggest a greedy move. In other words, we want a data structure which maintains
a set of rectangles, supports queries which return where to drop the
rectangle, and updates which insert a rectangle dropped at a certain position
and return the height of the highest point in the updated set of rectangles. We
show via a reduction to the Multiphase problem [P\u{a}tra\c{s}cu, 2010] that on
a board of width , if the OMv conjecture [Henzinger et al., 2015]
is true, then both operations cannot be supported in time
simultaneously. The reduction also implies polynomial bounds from the 3-SUM
conjecture and the APSP conjecture. On the other hand, we show that there is a
data structure supporting both operations in time on
boards of width , matching the lower bound up to a factor.Comment: Correction of typos and other minor correction
Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement Learning
With the Deep Neural Networks (DNNs) as a powerful function approximator,
Deep Reinforcement Learning (DRL) has been excellently demonstrated on robotic
control tasks. Compared to DNNs with vanilla artificial neurons, the
biologically plausible Spiking Neural Network (SNN) contains a diverse
population of spiking neurons, making it naturally powerful on state
representation with spatial and temporal information. Based on a hybrid
learning framework, where a spike actor-network infers actions from states and
a deep critic network evaluates the actor, we propose a Population-coding and
Dynamic-neurons improved Spiking Actor Network (PDSAN) for efficient state
representation from two different scales: input coding and neuronal coding. For
input coding, we apply population coding with dynamically receptive fields to
directly encode each input state component. For neuronal coding, we propose
different types of dynamic-neurons (containing 1st-order and 2nd-order neuronal
dynamics) to describe much more complex neuronal dynamics. Finally, the PDSAN
is trained in conjunction with deep critic networks using the Twin Delayed Deep
Deterministic policy gradient algorithm (TD3-PDSAN). Extensive experimental
results show that our TD3-PDSAN model achieves better performance than
state-of-the-art models on four OpenAI gym benchmark tasks. It is an important
attempt to improve RL with SNN towards the effective computation satisfying
biological plausibility.Comment: 27 pages, 11 figures, accepted by Journal of Neural Network
On learning history based policies for controlling Markov decision processes
Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas
recurrent neural nets or history-based state abstraction, perform better than
their memory-less counterparts, due to the fact that function approximation in
Markov decision processes (MDP) can be viewed as inducing a Partially
observable MDP. However, there has been little formal analysis of such
history-based algorithms, as most existing frameworks focus exclusively on
memory-less features. In this paper, we introduce a theoretical framework for
studying the behaviour of RL algorithms that learn to control an MDP using
history-based feature abstraction mappings. Furthermore, we use this framework
to design a practical RL algorithm and we numerically evaluate its
effectiveness on a set of continuous control tasks
Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Reinforcement learning agents deployed in the real world often have to cope
with partially observable environments. Therefore, most agents employ memory
mechanisms to approximate the state of the environment. Recently, there have
been impressive success stories in mastering partially observable environments,
mostly in the realm of computer games like Dota 2, StarCraft II, or MineCraft.
However, existing methods lack interpretability in the sense that it is not
comprehensible for humans what the agent stores in its memory. In this regard,
we propose a novel memory mechanism that represents past events in human
language. Our method uses CLIP to associate visual inputs with language tokens.
Then we feed these tokens to a pretrained language model that serves the agent
as memory and provides it with a coherent and human-readable representation of
the past. We train our memory mechanism on a set of partially observable
environments and find that it excels on tasks that require a memory component,
while mostly attaining performance on-par with strong baselines on tasks that
do not. On a challenging continuous recognition task, where memorizing the past
is crucial, our memory mechanism converges two orders of magnitude faster than
prior methods. Since our memory mechanism is human-readable, we can peek at an
agent's memory and check whether crucial pieces of information have been
stored. This significantly enhances troubleshooting and paves the way toward
more interpretable agents.Comment: To appear at NeurIPS 2023, 10 pages (+ references and appendix),
Code: https://github.com/ml-jku/hel
Collaborative Information Dissemination with Graph-based Multi-Agent Reinforcement Learning
Efficient information dissemination is crucial for supporting critical
operations across domains like disaster response, autonomous vehicles, and
sensor networks. This paper introduces a Multi-Agent Reinforcement Learning
(MARL) approach as a significant step forward in achieving more decentralized,
efficient, and collaborative information dissemination. We propose a Partially
Observable Stochastic Game (POSG) formulation for information dissemination
empowering each agent to decide on message forwarding independently, based on
the observation of their one-hop neighborhood. This constitutes a significant
paradigm shift from heuristics currently employed in real-world broadcast
protocols. Our novel approach harnesses Graph Convolutional Reinforcement
Learning and Graph Attention Networks (GATs) with dynamic attention to capture
essential network features. We propose two approaches, L-DyAN and HL-DyAN,
which differ in terms of the information exchanged among agents. Our
experimental results show that our trained policies outperform existing
methods, including the state-of-the-art heuristic, in terms of network coverage
as well as communication overhead on dynamic networks of varying density and
behavior.Comment: 13 pages, 5 figures, 4 table
- …