14,167 research outputs found
A Fast-CSMA Algorithm for Deadline-Constrained Scheduling over Wireless Fading Channels
Recently, low-complexity and distributed Carrier Sense Multiple Access
(CSMA)-based scheduling algorithms have attracted extensive interest due to
their throughput-optimal characteristics in general network topologies.
However, these algorithms are not well-suited for serving real-time traffic
under time-varying channel conditions for two reasons: (1) the mixing time of
the underlying CSMA Markov Chain grows with the size of the network, which, for
large networks, generates unacceptable delay for deadline-constrained traffic;
(2) since the dynamic CSMA parameters are influenced by the arrival and channel
state processes, the underlying CSMA Markov Chain may not converge to a
steady-state under strict deadline constraints and fading channel conditions.
In this paper, we attack the problem of distributed scheduling for serving
real-time traffic over time-varying channels. Specifically, we consider
fully-connected topologies with independently fading channels (which can model
cellular networks) in which flows with short-term deadline constraints and
long-term drop rate requirements are served. To that end, we first characterize
the maximal set of satisfiable arrival processes for this system and, then,
propose a Fast-CSMA (FCSMA) policy that is shown to be optimal in supporting
any real-time traffic that is within the maximal satisfiable set. These
theoretical results are further validated through simulations to demonstrate
the relative efficiency of the FCSMA policy compared to some of the existing
CSMA-based algorithms.Comment: This work appears in workshop on Resource Allocation and Cooperation
in Wireless Networks (RAWNET), Princeton, NJ, May, 201
Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach
Reinforcement learning (RL) agents have traditionally been tasked with
maximizing the value function of a Markov decision process (MDP), either in
continuous settings, with fixed discount factor , or in episodic
settings, with . While this has proven effective for specific tasks
with well-defined objectives (e.g., games), it has never been established that
fixed discounting is suitable for general purpose use (e.g., as a model of
human preferences). This paper characterizes rationality in sequential decision
making using a set of seven axioms and arrives at a form of discounting that
generalizes traditional fixed discounting. In particular, our framework admits
a state-action dependent "discount" factor that is not constrained to be less
than 1, so long as there is eventual long run discounting. Although this
broadens the range of possible preference structures in continuous settings, we
show that there exists a unique "optimizing MDP" with fixed whose
optimal value function matches the true utility of the optimal policy, and we
quantify the difference between value and utility for suboptimal policies. Our
work can be seen as providing a normative justification for (a slight
generalization of) Martha White's RL task formalism (2017) and other recent
departures from the traditional RL, and is relevant to task specification in
RL, inverse RL and preference-based RL.Comment: 8 pages + 1 page supplement. In proceedings of AAAI 2019. Slides,
poster and bibtex available at
https://silviupitis.com/#rethinking-the-discount-factor-in-reinforcement-learning-a-decision-theoretic-approac
Controlled diffusion processes
This article gives an overview of the developments in controlled diffusion
processes, emphasizing key results regarding existence of optimal controls and
their characterization via dynamic programming for a variety of cost criteria
and structural assumptions. Stochastic maximum principle and control under
partial observations (equivalently, control of nonlinear filters) are also
discussed. Several other related topics are briefly sketched.Comment: Published at http://dx.doi.org/10.1214/154957805100000131 in the
Probability Surveys (http://www.i-journals.org/ps/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Markov Decision Processes with Multiple Long-run Average Objectives
We study Markov decision processes (MDPs) with multiple limit-average (or
mean-payoff) functions. We consider two different objectives, namely,
expectation and satisfaction objectives. Given an MDP with k limit-average
functions, in the expectation objective the goal is to maximize the expected
limit-average value, and in the satisfaction objective the goal is to maximize
the probability of runs such that the limit-average value stays above a given
vector. We show that under the expectation objective, in contrast to the case
of one limit-average function, both randomization and memory are necessary for
strategies even for epsilon-approximation, and that finite-memory randomized
strategies are sufficient for achieving Pareto optimal values. Under the
satisfaction objective, in contrast to the case of one limit-average function,
infinite memory is necessary for strategies achieving a specific value (i.e.
randomized finite-memory strategies are not sufficient), whereas memoryless
randomized strategies are sufficient for epsilon-approximation, for all
epsilon>0. We further prove that the decision problems for both expectation and
satisfaction objectives can be solved in polynomial time and the trade-off
curve (Pareto curve) can be epsilon-approximated in time polynomial in the size
of the MDP and 1/epsilon, and exponential in the number of limit-average
functions, for all epsilon>0. Our analysis also reveals flaws in previous work
for MDPs with multiple mean-payoff functions under the expectation objective,
corrects the flaws, and allows us to obtain improved results
- …