19 research outputs found

    Two families of indexable partially observable restless bandits and Whittle index computation

    Full text link
    We consider the restless bandits with general state space under partial observability with two observational models: first, the state of each bandit is not observable at all, and second, the state of each bandit is observable only if it is chosen. We assume both models satisfy the restart property under which we prove indexability of the models and propose the Whittle index policy as the solution. For the first model, we derive a closed-form expression for the Whittle index. For the second model, we propose an efficient algorithm to compute the Whittle index by exploiting the qualitative properties of the optimal policy. We present detailed numerical experiments for multiple instances of machine maintenance problem. The result indicates that the Whittle index policy outperforms myopic policy and can be close to optimal in different setups

    Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret

    Get PDF
    The problem of distributed learning and channel access is considered in a cognitive network with multiple secondary users. The availability statistics of the channels are initially unknown to the secondary users and are estimated using sensing decisions. There is no explicit information exchange or prior agreement among the secondary users. We propose policies for distributed learning and access which achieve order-optimal cognitive system throughput (number of successful secondary transmissions) under self play, i.e., when implemented at all the secondary users. Equivalently, our policies minimize the regret in distributed learning and access. We first consider the scenario when the number of secondary users is known to the policy, and prove that the total regret is logarithmic in the number of transmission slots. Our distributed learning and access policy achieves order-optimal regret by comparing to an asymptotic lower bound for regret under any uniformly-good learning and access policy. We then consider the case when the number of secondary users is fixed but unknown, and is estimated through feedback. We propose a policy in this scenario whose asymptotic sum regret which grows slightly faster than logarithmic in the number of transmission slots.Comment: Submitted to IEEE JSAC on Advances in Cognitive Radio Networking and Communications, Dec. 2009, Revised May 201

    Optimal and Suboptimal Policies for Opportunistic Spectrum Access: A Resource Allocation Approach.

    Full text link
    In recent years there has been significant research in increasing efficiency of using spectrum. This concept known as smart radios or Cognitive radio has received widespread attention by companies such as Google and Motorola looking for making contracts with FCC and designing smart radios which can effectively use the unused bandwidth and spectrum in order to transmit their signals without interference with signals of primary users. In this thesis, we study several problems related to resource allocation in wireless networks through modeling and studying them as game theory and stochastic control problems. In the first problem we looked at methods for designing optimal cognitive radios which use optimal and suboptimal sensing policies in order to maximize their long-term expected reward within a finite or infinite horizon. We proved in the case that channels are bursty and user can select only one channel and probe it, the optimal policy for the radio is to use a greedy policy in probing channels and select the channel at each moment that has the highest probability of being available for transmission. In second problem we modeled resource allocation as a congestion game and studied existence of Nash equilibrium for such game. In the last problem, we studied a more general case of the first problem where primary user can select multiple channels at a time in order to sense them. Again the goal of the cognitive radio in this case is to select those channels for sensing that provide him with the highest expected reward in the respective horizon where reward comes from successfully probing a channel and transmitting through it. We summarized all results in the conclusion chapter.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78778/1/shajiali_1.pd

    Restless bandit index policies for dynamic sensor scheduling optimization

    Get PDF
    This dissertation addresses two complex stochastic and dynamic resource allocation problems, with application in modern sensor systems: (i) hunting multiple elusive hiding targets and (ii) tracking multiple moving targets. These problems are naturally formulated as Multi-armed Restless Bandit Problems (MARBPs) with real-state variables, which introduces technical difficulties that cause its optimal solution to be intractable. Hence, in this thesis we focus on designing tractable and well-performing heuristic policies of priority-index type. We consider the above MARBPs as Markov Decision Processess (MDPs) with special structure, and we deploy recent extensions to the unifying principle to design a dynamic priority index policy based on a Lagrangian relaxation and decomposition approach. This approach allows to design an index rule based on a structural property of the optimal solution to the decomposed parametric-optimization subproblems. The resulting index is a measure of the Marginal Productivity (MP) of resources invested in the subproblems, and it is then used to define a heuristic priority rule for the original intractable problems. For each of the problems under consideration we perform such a decomposition, to analyze the conditions under which the index recovering the optimal policies for the subproblems exists. We further obtain formulae for the indices which do not admit a closed form expression, but which are approximately computed by a tractable evaluation method. Apart from the practical contribution of deriving the tractable sensor scheduling polices which improve on existing heuristics, the main contributions of this thesis are the following: (i) deploying the recent extensions of Sufficient Indexability Conditions (SIC) to the real state case, for two problems in which direct verification of the SIC and obtaining a closed-form index formula are not possible, (ii) addressing the technical difficulties to analyze PCL-indexability introduced by the uncountable state space of the MARBPs of concern, and the state evolution over it given by non-linear dynamics by exploiting the special structure of the trajectories of the state and the action processes under a threshold policy using properties of M¨obius Transformations, and (iii) providing with a tractable approximate evaluation method for the resulting index policies._________________________________________________________________________________________________________________________________Esta tesis estudia dos problemas dinámicos y estocásticos de asignación de recursos, con aplicación a sistemas modernos de sensores: (i) localización de múltiples objetivos evasivos que se ocultan y (ii) el rastreo de múltiples objetivos que se mueven. Estos problemas son modelizados naturalmente como problemas de “Multi-armed Restless Bandit” con variable de estado real, lo que introduce dificultades técnicas que causan que su solción óptima no sea computacionalmente tratable. Debido a esto, en esta tesis nos concentramos en cambio en diseñar políticas heurísticas de prioridad que sean computacionalmente tratables y cuyo rendimento sea casi óptimo. Modelizamos los problemas arriba mencionados como problemas de decisión Markovianos con estructura especial y les aplicamos resultados existentes en la literatura, los que constituyen un principio unificador para el diseño de políticas de índices de prioridad basadas en la relajación Lagrangiana y la descomposición de esos problemas. Este enfoque nos permite considerar una propiedad de los subproblemas: la indexabilidad, por la cual podemos resolverlos de manera óptima mediante una política índice. El índice resultante es una medida de productividad de los recursos invertidos en los subproblemas, y es usado luego como medidad de la prioridad dinámica para los problemas originales intratables. Para cada uno de los problemas bajo estudio realizamos tal descomposición, y analizamos las condiciones bajo las que una política índice que recupere la solución óptima de los subproblemas existe. Además obtenemos fórmulas para los índices, las que a pesar de no admitir una expresión cerrada, son calculadas aproximadamente de manera eficiente meadiante un método tratable. Aparte de la contribución práctica de obtener reglas heurísticas de índices de prioridad para el funcionamiento de sistemas de múltiples sensores en el contexto de los dos problemas analizados, las principales contribuciones teóricas son las siguientes: (i) la aplicación de las extensiones recientes de las condiciones suficientes de indexabilidad para el caso de variable de estado real, para dos problemas en los que tanto la verificación directa de ellas como la obtención de fórmulas cerradas no son posibles, (ii) el tratamiento de las dificultades técnicas para establecer la indexabilidad introducidas por el espacio de estado infinito de los problemas bajo consideración, y por la evolución sobre este estado dada por dinámicas no lineales, explotando propiedades estructurales de los procesos de la variable de estado y trabajo bajo políticas de umbral como recursiones de Transformaciones de Möbius, and (iii) un método aproximado de evaluación de las políticas de índices resultantes
    corecore