2,705 research outputs found
RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems
With the rapid advance of information technology, network systems have become
increasingly complex and hence the underlying system dynamics are often unknown
or difficult to characterize. Finding a good network control policy is of
significant importance to achieve desirable network performance (e.g., high
throughput or low delay). In this work, we consider using model-based
reinforcement learning (RL) to learn the optimal control policy for queueing
networks so that the average job delay (or equivalently the average queue
backlog) is minimized. Traditional approaches in RL, however, cannot handle the
unbounded state spaces of the network control problem. To overcome this
difficulty, we propose a new algorithm, called Reinforcement Learning for
Queueing Networks (RL-QN), which applies model-based RL methods over a finite
subset of the state space, while applying a known stabilizing policy for the
rest of the states. We establish that the average queue backlog under RL-QN
with an appropriately constructed subset can be arbitrarily close to the
optimal result. We evaluate RL-QN in dynamic server allocation, routing and
switching problems. Simulation results show that RL-QN minimizes the average
queue backlog effectively
Kemahiran menggunakan peralatan dan perisian dalam menghasilkan produk ukur : satu tinjauan ke atas pelajar diploma ukur tanah di Politeknik Sultan Haji Ahmad Shah, Kuantan, Pahang
Projek ini adalah untuk melihat kemahiran yang diperlukan oleh pelajar Diploma Ukur Tanah dalam menggunakan peralatan ukur dan perisian berkaitan. Sampel kajian terdiri daripada 32 orang pelajar semester keenam yang sedang mengikuti kursus Diploma Ukur Tanah di Politeknik Sultan Haji Ahamd Shah, Kuantan Pahang. Perolehan data adalah melalui borang soal selidik. Pengkaji memberi tumpuan kepada persoalan kajian yang melihat kepada tiga aspek iaitu, jenis-jenis peralatan dan perisian ukur tanah di firma ukur tanah, aspek kemahiran-kemahiran yang dimiliki pelajar meliputi kemahiran menggunakan peralatan ukur, kemahiran menggunakan perisian ukur dan kemahiran-kemahiran asas meliputi teori yang diperlukan dalam keija-keija ukur dan dalam menghasilan produk uk ur. Dapatan kajian menunjukkan pelajar mahir menggunakan alat ukur manual dan kemahiran pelajar terhadap penggunaan perisian adalah tidak pelbagai. Hasil kajian juga menunjukkan bahawa pelajar mahir dalam mengaplikasikan teori-teori yang digunakan dalam keija ukur dan penghasilan produk ukur
Optimal Network Control in Partially-Controllable Networks
The effectiveness of many optimal network control algorithms (e.g.,
BackPressure) relies on the premise that all of the nodes are fully
controllable. However, these algorithms may yield poor performance in a
partially-controllable network where a subset of nodes are uncontrollable and
use some unknown policy. Such a partially-controllable model is of increasing
importance in real-world networked systems such as overlay-underlay networks.
In this paper, we design optimal network control algorithms that can stabilize
a partially-controllable network. We first study the scenario where
uncontrollable nodes use a queue-agnostic policy, and propose a low-complexity
throughput-optimal algorithm, called Tracking-MaxWeight (TMW), which enhances
the original MaxWeight algorithm with an explicit learning of the policy used
by uncontrollable nodes. Next, we investigate the scenario where uncontrollable
nodes use a queue-dependent policy and the problem is formulated as an MDP with
unknown queueing dynamics. We propose a new reinforcement learning algorithm,
called Truncated Upper Confidence Reinforcement Learning (TUCRL), and prove
that TUCRL achieves tunable three-way tradeoffs between throughput, delay and
convergence rate
Unbounded Human Learning: Optimal Scheduling for Spaced Repetition
In the study of human learning, there is broad evidence that our ability to
retain information improves with repeated exposure and decays with delay since
last exposure. This plays a crucial role in the design of educational software,
leading to a trade-off between teaching new material and reviewing what has
already been taught. A common way to balance this trade-off is spaced
repetition, which uses periodic review of content to improve long-term
retention. Though spaced repetition is widely used in practice, e.g., in
electronic flashcard software, there is little formal understanding of the
design of these systems. Our paper addresses this gap in three ways. First, we
mine log data from spaced repetition software to establish the functional
dependence of retention on reinforcement and delay. Second, we use this memory
model to develop a stochastic model for spaced repetition systems. We propose a
queueing network model of the Leitner system for reviewing flashcards, along
with a heuristic approximation that admits a tractable optimization problem for
review scheduling. Finally, we empirically evaluate our queueing model through
a Mechanical Turk experiment, verifying a key qualitative prediction of our
model: the existence of a sharp phase transition in learning outcomes upon
increasing the rate of new item introductions.Comment: Accepted to the ACM SIGKDD Conference on Knowledge Discovery and Data
Mining 201
Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning
Multicasting in wireless systems is a natural way to exploit the redundancy
in user requests in a Content Centric Network. Power control and optimal
scheduling can significantly improve the wireless multicast network's
performance under fading. However, the model based approaches for power control
and scheduling studied earlier are not scalable to large state space or
changing system dynamics. In this paper, we use deep reinforcement learning
where we use function approximation of the Q-function via a deep neural network
to obtain a power control policy that matches the optimal policy for a small
network. We show that power control policy can be learnt for reasonably large
systems via this approach. Further we use multi-timescale stochastic
optimization to maintain the average power constraint. We demonstrate that a
slight modification of the learning algorithm allows tracking of time varying
system statistics. Finally, we extend the multi-timescale approach to
simultaneously learn the optimal queueing strategy along with power control. We
demonstrate scalability, tracking and cross layer optimization capabilities of
our algorithms via simulations. The proposed multi-timescale approach can be
used in general large state space dynamical systems with multiple objectives
and constraints, and may be of independent interest.Comment: arXiv admin note: substantial text overlap with arXiv:1910.0530
Learning Algorithms for Minimizing Queue Length Regret
We consider a system consisting of a single transmitter/receiver pair and
channels over which they may communicate. Packets randomly arrive to the
transmitter's queue and wait to be successfully sent to the receiver. The
transmitter may attempt a frame transmission on one channel at a time, where
each frame includes a packet if one is in the queue. For each channel, an
attempted transmission is successful with an unknown probability. The
transmitter's objective is to quickly identify the best channel to minimize the
number of packets in the queue over time slots. To analyze system
performance, we introduce queue length regret, which is the expected difference
between the total queue length of a learning policy and a controller that knows
the rates, a priori. One approach to designing a transmission policy would be
to apply algorithms from the literature that solve the closely-related
stochastic multi-armed bandit problem. These policies would focus on maximizing
the number of successful frame transmissions over time. However, we show that
these methods have queue length regret. On the other hand, we
show that there exists a set of queue-length based policies that can obtain
order optimal queue length regret. We use our theoretical analysis to
devise heuristic methods that are shown to perform well in simulation.Comment: 28 Pages, 11 figure
- …