Search CORE

6,060 research outputs found

Learning Algorithms for Minimizing Queue Length Regret

Author: Modiano Eytan
Shrader Brooke
Stahlbuhk Thomas
Publication venue
Publication date: 14/05/2020
Field of study

We consider a system consisting of a single transmitter/receiver pair and

N

channels over which they may communicate. Packets randomly arrive to the transmitter's queue and wait to be successfully sent to the receiver. The transmitter may attempt a frame transmission on one channel at a time, where each frame includes a packet if one is in the queue. For each channel, an attempted transmission is successful with an unknown probability. The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over

T

time slots. To analyze system performance, we introduce queue length regret, which is the expected difference between the total queue length of a learning policy and a controller that knows the rates, a priori. One approach to designing a transmission policy would be to apply algorithms from the literature that solve the closely-related stochastic multi-armed bandit problem. These policies would focus on maximizing the number of successful frame transmissions over time. However, we show that these methods have

\Omega(\log{T})

queue length regret. On the other hand, we show that there exists a set of queue-length based policies that can obtain order optimal

O(1)

queue length regret. We use our theoretical analysis to devise heuristic methods that are shown to perform well in simulation.Comment: 28 Pages, 11 figure

arXiv.org e-Print Archive

DSpace@MIT

Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

Author: Howley Enda
Mousavi Seyed Sajad
Schukat Michael
Publication venue
Publication date: 27/05/2017
Field of study

Recent advances in combining deep neural network architectures with reinforcement learning techniques have shown promising potential results in solving complex control problems with high dimensional state and action spaces. Inspired by these successes, in this paper, we build two kinds of reinforcement learning algorithms: deep policy-gradient and value-function based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The policy-gradient based agent maps its observation directly to the control signal, however the value-function based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Our methods show promising results in a traffic network simulated in the SUMO traffic simulator, without suffering from instability issues during the training process

arXiv.org e-Print Archive

Irish Universities

Access to Research at National University of Ireland, Galway

Recommended from our members

Speed trajectory data from adaptive eco-driving applications

Author: Bai Zhenwei
Barth Matthew J
Hao Peng
Wei Zhensong
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

eScholarship - University of California

A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications

Author: Coogan Samuel
Kim Eric S.
Sadigh Dorsa
Sastry S. Shankar
Seshia Sanjit A.
Publication venue
Publication date: 01/01/2014
Field of study

We propose to synthesize a control policy for a Markov decision process (MDP) such that the resulting traces of the MDP satisfy a linear temporal logic (LTL) property. We construct a product MDP that incorporates a deterministic Rabin automaton generated from the desired LTL property. The reward function of the product MDP is defined from the acceptance condition of the Rabin automaton. This construction allows us to apply techniques from learning theory to the problem of synthesis for LTL specifications even when the transition probabilities are not known a priori. We prove that our method is guaranteed to find a controller that satisfies the LTL property with probability one if such a policy exists, and we suggest empirically with a case study in traffic control that our method produces reasonable control strategies even when the LTL property cannot be satisfied with probability one

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California