39,678 research outputs found
Safe Learning for Near Optimal Scheduling
In this paper, we investigate the combination of synthesis, model-based
learning, and online sampling techniques to obtain safe and near-optimal
schedulers for a preemptible task scheduling problem. Our algorithms can handle
Markov decision processes (MDPs) that have 1020 states and beyond which cannot
be handled with state-of-the art probabilistic model-checkers. We provide
probably approximately correct (PAC) guarantees for learning the model.
Additionally, we extend Monte-Carlo tree search with advice, computed using
safety games or obtained using the earliest-deadline-first scheduler, to safely
explore the learned model online. Finally, we implemented and compared our
algorithms empirically against shielded deep Q-learning on large task systems
Scheduling for Urban Air Mobility using Safe Learning
This work considers the scheduling problem for Urban Air Mobility (UAM)
vehicles travelling between origin-destination pairs with both hard and soft
trip deadlines. Each route is described by a discrete probability distribution
over trip completion times (or delay) and over inter-arrival times of requests
(or demand) for the route along with a fixed hard or soft deadline. Soft
deadlines carry a cost that is incurred when the deadline is missed. An online,
safe scheduler is developed that ensures that hard deadlines are never missed,
and that average cost of missing soft deadlines is minimized. The system is
modelled as a Markov Decision Process (MDP) and safe model-based learning is
used to find the probabilistic distributions over route delays and demand.
Monte Carlo Tree Search (MCTS) Earliest Deadline First (EDF) is used to safely
explore the learned models in an online fashion and develop a near-optimal
non-preemptive scheduling policy. These results are compared with Value
Iteration (VI) and MCTS (Random) scheduling solutions.Comment: In Proceedings FMAS2022 ASYDE2022, arXiv:2209.1318
Learning Algorithms for Minimizing Queue Length Regret
We consider a system consisting of a single transmitter/receiver pair and
channels over which they may communicate. Packets randomly arrive to the
transmitter's queue and wait to be successfully sent to the receiver. The
transmitter may attempt a frame transmission on one channel at a time, where
each frame includes a packet if one is in the queue. For each channel, an
attempted transmission is successful with an unknown probability. The
transmitter's objective is to quickly identify the best channel to minimize the
number of packets in the queue over time slots. To analyze system
performance, we introduce queue length regret, which is the expected difference
between the total queue length of a learning policy and a controller that knows
the rates, a priori. One approach to designing a transmission policy would be
to apply algorithms from the literature that solve the closely-related
stochastic multi-armed bandit problem. These policies would focus on maximizing
the number of successful frame transmissions over time. However, we show that
these methods have queue length regret. On the other hand, we
show that there exists a set of queue-length based policies that can obtain
order optimal queue length regret. We use our theoretical analysis to
devise heuristic methods that are shown to perform well in simulation.Comment: 28 Pages, 11 figure
- …