Search CORE

14,167 research outputs found

A Fast-CSMA Algorithm for Deadline-Constrained Scheduling over Wireless Fading Channels

Author: Eryilmaz Atilla
Li Bin
Publication venue
Publication date: 01/01/2011
Field of study

Recently, low-complexity and distributed Carrier Sense Multiple Access (CSMA)-based scheduling algorithms have attracted extensive interest due to their throughput-optimal characteristics in general network topologies. However, these algorithms are not well-suited for serving real-time traffic under time-varying channel conditions for two reasons: (1) the mixing time of the underlying CSMA Markov Chain grows with the size of the network, which, for large networks, generates unacceptable delay for deadline-constrained traffic; (2) since the dynamic CSMA parameters are influenced by the arrival and channel state processes, the underlying CSMA Markov Chain may not converge to a steady-state under strict deadline constraints and fading channel conditions. In this paper, we attack the problem of distributed scheduling for serving real-time traffic over time-varying channels. Specifically, we consider fully-connected topologies with independently fading channels (which can model cellular networks) in which flows with short-term deadline constraints and long-term drop rate requirements are served. To that end, we first characterize the maximal set of satisfiable arrival processes for this system and, then, propose a Fast-CSMA (FCSMA) policy that is shown to be optimal in supporting any real-time traffic that is within the maximal satisfiable set. These theoretical results are further validated through simulations to demonstrate the relative efficiency of the FCSMA policy compared to some of the existing CSMA-based algorithms.Comment: This work appears in workshop on Resource Allocation and Cooperation in Wireless Networks (RAWNET), Princeton, NJ, May, 201

arXiv.org e-Print Archive

CiteSeerX

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Author: Pitis Silviu
Publication venue
Publication date: 07/02/2019
Field of study

Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor

\gamma < 1

, or in episodic settings, with

\gamma = 1

. While this has proven effective for specific tasks with well-defined objectives (e.g., games), it has never been established that fixed discounting is suitable for general purpose use (e.g., as a model of human preferences). This paper characterizes rationality in sequential decision making using a set of seven axioms and arrives at a form of discounting that generalizes traditional fixed discounting. In particular, our framework admits a state-action dependent "discount" factor that is not constrained to be less than 1, so long as there is eventual long run discounting. Although this broadens the range of possible preference structures in continuous settings, we show that there exists a unique "optimizing MDP" with fixed

\gamma < 1

whose optimal value function matches the true utility of the optimal policy, and we quantify the difference between value and utility for suboptimal policies. Our work can be seen as providing a normative justification for (a slight generalization of) Martha White's RL task formalism (2017) and other recent departures from the traditional RL, and is relevant to task specification in RL, inverse RL and preference-based RL.Comment: 8 pages + 1 page supplement. In proceedings of AAAI 2019. Slides, poster and bibtex available at https://silviupitis.com/#rethinking-the-discount-factor-in-reinforcement-learning-a-decision-theoretic-approac

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Controlled diffusion processes

Author: Borkar Vivek S.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

This article gives an overview of the developments in controlled diffusion processes, emphasizing key results regarding existence of optimal controls and their characterization via dynamic programming for a variety of cost criteria and structural assumptions. Stochastic maximum principle and control under partial observations (equivalently, control of nonlinear filters) are also discussed. Several other related topics are briefly sketched.Comment: Published at http://dx.doi.org/10.1214/154957805100000131 in the Probability Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Markov Decision Processes with Multiple Long-run Average Objectives

Author: Antonín Kučera
Krishnendu Chatterjee
Stephan Kreutzer
Tomá Brázdil
Vojtěch Forejt
Václav Broek
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2011
Field of study

We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for epsilon-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for epsilon-approximation, for all epsilon>0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of limit-average functions, for all epsilon>0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results

arXiv.org e-Print Archive

Crossref

Episciences.org

Oxford University Research Archive

IST Austria: PubRep (Institute of Science and Technology)