4,544 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Energy-Efficient Transmission Scheduling with Strict Underflow Constraints
We consider a single source transmitting data to one or more receivers/users
over a shared wireless channel. Due to random fading, the wireless channel
conditions vary with time and from user to user. Each user has a buffer to
store received packets before they are drained. At each time step, the source
determines how much power to use for transmission to each user. The source's
objective is to allocate power in a manner that minimizes an expected cost
measure, while satisfying strict buffer underflow constraints and a total power
constraint in each slot. The expected cost measure is composed of costs
associated with power consumption from transmission and packet holding costs.
The primary application motivating this problem is wireless media streaming.
For this application, the buffer underflow constraints prevent the user buffers
from emptying, so as to maintain playout quality. In the case of a single user
with linear power-rate curves, we show that a modified base-stock policy is
optimal under the finite horizon, infinite horizon discounted, and infinite
horizon average expected cost criteria. For a single user with piecewise-linear
convex power-rate curves, we show that a finite generalized base-stock policy
is optimal under all three expected cost criteria. We also present the
sequences of critical numbers that complete the characterization of the optimal
control laws in each of these cases when some additional technical conditions
are satisfied. We then analyze the structure of the optimal policy for the case
of two users. We conclude with a discussion of methods to identify
implementable near-optimal policies for the most general case of M users.Comment: 109 pages, 11 pdf figures, template.tex is main file. We have
significantly revised the paper from version 1. Additions include the case of
a single receiver with piecewise-linear convex power-rate curves, the case of
two receivers, and the infinite horizon average expected cost proble
Resource management in QoS-aware wireless cellular networks
2011 Summer.Includes bibliographical references.Emerging broadband wireless networks that support high speed packet data with heterogeneous quality of service (QoS) requirements demand more flexible and efficient use of the scarce spectral resource. Opportunistic scheduling exploits the time-varying, location-dependent channel conditions to achieve multiuser diversity. In this work, we study two types of resource allocation problems in QoS-aware wireless cellular networks. First, we develop a rigorous framework to study opportunistic scheduling in multiuser OFDM systems. We derive optimal opportunistic scheduling policies under three common QoS/fairness constraints for multiuser OFDM systems--temporal fairness, utilitarian fairness, and minimum-performance guarantees. To implement these optimal policies efficiently, we provide a modified Hungarian algorithm and a simple suboptimal algorithm. We then propose a generalized opportunistic scheduling framework that incorporates multiple mixed QoS/fairness constraints, including providing both lower and upper bound constraints. Next, taking input queues and channel memory into consideration, we reformulate the transmission scheduling problem as a new class of Markov decision processes (MDPs) with fairness constraints. We investigate the throughput maximization and the delay minimization problems in this context. We study two categories of fairness constraints, namely temporal fairness and utilitarian fairness. We consider two criteria: infinite horizon expected total discounted reward and expected average reward. We derive and prove explicit dynamic programming equations for the above constrained MDPs, and characterize optimal scheduling policies based on those equations. An attractive feature of our proposed schemes is that they can easily be extended to fit different objective functions and other fairness measures. Although we only focus on uplink scheduling, the scheme is equally applicable to the downlink case. Furthermore, we develop an efficient approximation method--temporal fair rollout--to reduce the computational cost
- …