6,060 research outputs found
Learning Algorithms for Minimizing Queue Length Regret
We consider a system consisting of a single transmitter/receiver pair and
channels over which they may communicate. Packets randomly arrive to the
transmitter's queue and wait to be successfully sent to the receiver. The
transmitter may attempt a frame transmission on one channel at a time, where
each frame includes a packet if one is in the queue. For each channel, an
attempted transmission is successful with an unknown probability. The
transmitter's objective is to quickly identify the best channel to minimize the
number of packets in the queue over time slots. To analyze system
performance, we introduce queue length regret, which is the expected difference
between the total queue length of a learning policy and a controller that knows
the rates, a priori. One approach to designing a transmission policy would be
to apply algorithms from the literature that solve the closely-related
stochastic multi-armed bandit problem. These policies would focus on maximizing
the number of successful frame transmissions over time. However, we show that
these methods have queue length regret. On the other hand, we
show that there exists a set of queue-length based policies that can obtain
order optimal queue length regret. We use our theoretical analysis to
devise heuristic methods that are shown to perform well in simulation.Comment: 28 Pages, 11 figure
Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning
Recent advances in combining deep neural network architectures with
reinforcement learning techniques have shown promising potential results in
solving complex control problems with high dimensional state and action spaces.
Inspired by these successes, in this paper, we build two kinds of reinforcement
learning algorithms: deep policy-gradient and value-function based agents which
can predict the best possible traffic signal for a traffic intersection. At
each time step, these adaptive traffic light control agents receive a snapshot
of the current state of a graphical traffic simulator and produce control
signals. The policy-gradient based agent maps its observation directly to the
control signal, however the value-function based agent first estimates values
for all legal control signals. The agent then selects the optimal control
action with the highest value. Our methods show promising results in a traffic
network simulated in the SUMO traffic simulator, without suffering from
instability issues during the training process
A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications
We propose to synthesize a control policy for a Markov decision process (MDP)
such that the resulting traces of the MDP satisfy a linear temporal logic (LTL)
property. We construct a product MDP that incorporates a deterministic Rabin
automaton generated from the desired LTL property. The reward function of the
product MDP is defined from the acceptance condition of the Rabin automaton.
This construction allows us to apply techniques from learning theory to the
problem of synthesis for LTL specifications even when the transition
probabilities are not known a priori. We prove that our method is guaranteed to
find a controller that satisfies the LTL property with probability one if such
a policy exists, and we suggest empirically with a case study in traffic
control that our method produces reasonable control strategies even when the
LTL property cannot be satisfied with probability one
- …