2,705 research outputs found

    RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems

    Full text link
    With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space, while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively

    Kemahiran menggunakan peralatan dan perisian dalam menghasilkan produk ukur : satu tinjauan ke atas pelajar diploma ukur tanah di Politeknik Sultan Haji Ahmad Shah, Kuantan, Pahang

    Get PDF
    Projek ini adalah untuk melihat kemahiran yang diperlukan oleh pelajar Diploma Ukur Tanah dalam menggunakan peralatan ukur dan perisian berkaitan. Sampel kajian terdiri daripada 32 orang pelajar semester keenam yang sedang mengikuti kursus Diploma Ukur Tanah di Politeknik Sultan Haji Ahamd Shah, Kuantan Pahang. Perolehan data adalah melalui borang soal selidik. Pengkaji memberi tumpuan kepada persoalan kajian yang melihat kepada tiga aspek iaitu, jenis-jenis peralatan dan perisian ukur tanah di firma ukur tanah, aspek kemahiran-kemahiran yang dimiliki pelajar meliputi kemahiran menggunakan peralatan ukur, kemahiran menggunakan perisian ukur dan kemahiran-kemahiran asas meliputi teori yang diperlukan dalam keija-keija ukur dan dalam menghasilan produk uk ur. Dapatan kajian menunjukkan pelajar mahir menggunakan alat ukur manual dan kemahiran pelajar terhadap penggunaan perisian adalah tidak pelbagai. Hasil kajian juga menunjukkan bahawa pelajar mahir dalam mengaplikasikan teori-teori yang digunakan dalam keija ukur dan penghasilan produk ukur

    Optimal Network Control in Partially-Controllable Networks

    Full text link
    The effectiveness of many optimal network control algorithms (e.g., BackPressure) relies on the premise that all of the nodes are fully controllable. However, these algorithms may yield poor performance in a partially-controllable network where a subset of nodes are uncontrollable and use some unknown policy. Such a partially-controllable model is of increasing importance in real-world networked systems such as overlay-underlay networks. In this paper, we design optimal network control algorithms that can stabilize a partially-controllable network. We first study the scenario where uncontrollable nodes use a queue-agnostic policy, and propose a low-complexity throughput-optimal algorithm, called Tracking-MaxWeight (TMW), which enhances the original MaxWeight algorithm with an explicit learning of the policy used by uncontrollable nodes. Next, we investigate the scenario where uncontrollable nodes use a queue-dependent policy and the problem is formulated as an MDP with unknown queueing dynamics. We propose a new reinforcement learning algorithm, called Truncated Upper Confidence Reinforcement Learning (TUCRL), and prove that TUCRL achieves tunable three-way tradeoffs between throughput, delay and convergence rate

    Unbounded Human Learning: Optimal Scheduling for Spaced Repetition

    Full text link
    In the study of human learning, there is broad evidence that our ability to retain information improves with repeated exposure and decays with delay since last exposure. This plays a crucial role in the design of educational software, leading to a trade-off between teaching new material and reviewing what has already been taught. A common way to balance this trade-off is spaced repetition, which uses periodic review of content to improve long-term retention. Though spaced repetition is widely used in practice, e.g., in electronic flashcard software, there is little formal understanding of the design of these systems. Our paper addresses this gap in three ways. First, we mine log data from spaced repetition software to establish the functional dependence of retention on reinforcement and delay. Second, we use this memory model to develop a stochastic model for spaced repetition systems. We propose a queueing network model of the Leitner system for reviewing flashcards, along with a heuristic approximation that admits a tractable optimization problem for review scheduling. Finally, we empirically evaluate our queueing model through a Mechanical Turk experiment, verifying a key qualitative prediction of our model: the existence of a sharp phase transition in learning outcomes upon increasing the rate of new item introductions.Comment: Accepted to the ACM SIGKDD Conference on Knowledge Discovery and Data Mining 201

    Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning

    Full text link
    Multicasting in wireless systems is a natural way to exploit the redundancy in user requests in a Content Centric Network. Power control and optimal scheduling can significantly improve the wireless multicast network's performance under fading. However, the model based approaches for power control and scheduling studied earlier are not scalable to large state space or changing system dynamics. In this paper, we use deep reinforcement learning where we use function approximation of the Q-function via a deep neural network to obtain a power control policy that matches the optimal policy for a small network. We show that power control policy can be learnt for reasonably large systems via this approach. Further we use multi-timescale stochastic optimization to maintain the average power constraint. We demonstrate that a slight modification of the learning algorithm allows tracking of time varying system statistics. Finally, we extend the multi-timescale approach to simultaneously learn the optimal queueing strategy along with power control. We demonstrate scalability, tracking and cross layer optimization capabilities of our algorithms via simulations. The proposed multi-timescale approach can be used in general large state space dynamical systems with multiple objectives and constraints, and may be of independent interest.Comment: arXiv admin note: substantial text overlap with arXiv:1910.0530

    Learning Algorithms for Minimizing Queue Length Regret

    Full text link
    We consider a system consisting of a single transmitter/receiver pair and NN channels over which they may communicate. Packets randomly arrive to the transmitter's queue and wait to be successfully sent to the receiver. The transmitter may attempt a frame transmission on one channel at a time, where each frame includes a packet if one is in the queue. For each channel, an attempted transmission is successful with an unknown probability. The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over TT time slots. To analyze system performance, we introduce queue length regret, which is the expected difference between the total queue length of a learning policy and a controller that knows the rates, a priori. One approach to designing a transmission policy would be to apply algorithms from the literature that solve the closely-related stochastic multi-armed bandit problem. These policies would focus on maximizing the number of successful frame transmissions over time. However, we show that these methods have Ω(logT)\Omega(\log{T}) queue length regret. On the other hand, we show that there exists a set of queue-length based policies that can obtain order optimal O(1)O(1) queue length regret. We use our theoretical analysis to devise heuristic methods that are shown to perform well in simulation.Comment: 28 Pages, 11 figure
    corecore