106 research outputs found

    Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics

    Full text link
    We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a player chooses M out of N arms to play at each time. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. The performance of an arm selection policy is measured by regret, defined as the reward loss with respect to the case where the player knows which M arms are the most rewarding and always plays the M best arms. We construct a policy with an interleaving exploration and exploitation epoch structure that achieves a regret with logarithmic order when arbitrary (but nontrivial) bounds on certain system parameters are known. When no knowledge about the system is available, we show that the proposed policy achieves a regret arbitrarily close to the logarithmic order. We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange. Under both an exogenous restless model and an endogenous restless model, we show that a decentralized extension of the proposed policy preserves the logarithmic regret order as in the centralized setting. The results apply to adaptive learning in various dynamic systems and communication networks, as well as financial investment.Comment: 33 pages, 5 figures, submitted to IEEE Transactions on Information Theory, 201

    Distributive Stochastic Learning for Delay-Optimal OFDMA Power and Subband Allocation

    Full text link
    In this paper, we consider the distributive queue-aware power and subband allocation design for a delay-optimal OFDMA uplink system with one base station, KK users and NFN_F independent subbands. Each mobile has an uplink queue with heterogeneous packet arrivals and delay requirements. We model the problem as an infinite horizon average reward Markov Decision Problem (MDP) where the control actions are functions of the instantaneous Channel State Information (CSI) as well as the joint Queue State Information (QSI). To address the distributive requirement and the issue of exponential memory requirement and computational complexity, we approximate the subband allocation Q-factor by the sum of the per-user subband allocation Q-factor and derive a distributive online stochastic learning algorithm to estimate the per-user Q-factor and the Lagrange multipliers (LM) simultaneously and determine the control actions using an auction mechanism. We show that under the proposed auction mechanism, the distributive online learning converges almost surely (with probability 1). For illustration, we apply the proposed distributive stochastic learning framework to an application example with exponential packet size distribution. We show that the delay-optimal power control has the {\em multi-level water-filling} structure where the CSI determines the instantaneous power allocation and the QSI determines the water-level. The proposed algorithm has linear signaling overhead and computational complexity O(KN)\mathcal O(KN), which is desirable from an implementation perspective.Comment: To appear in Transactions on Signal Processin

    Distortion-Tolerant Communications with Correlated Information

    Get PDF
    This dissertation is devoted to the development of distortion-tolerant communication techniques by exploiting the spatial and/or temporal correlation in a broad range of wireless communication systems under various system configurations. Signals observed in wireless communication systems are often correlated in the spatial and/or temporal domains, and the correlation can be used to facilitate system designs and to improve system performance. First, the optimum node density, i.e., the optimum number of nodes in a unit area, is identified by utilizing the spatial data correlation in the one- and two-dimensional wireless sensor networks (WSNs), under the constraint of fixed power per unit area. The WSNs distortion is quantized as the mean square error between the original and the reconstructed signals. Then we extend the analysis into WSNs with spatial-temporally correlated data. The optimum sampling in the space and time domains is derived. The analytical optimum results can provide insights and guidelines on the design of practical WSNs. Second, distributed source coding schemes are developed by exploiting the data correlation in a wireless network with spatially distributed sources. A new symmetric distributed joint source-channel coding scheme (DJSCC) is proposed by utilizing the spatial source correlation. Then the DJSCC code is applied to spatial-temporally correlated sources. The temporal correlated data is modeled as the Markov chain. Correspondingly, two decoding algorithms are proposed. The first multi-codeword message passing algorithm (MCMP) is designed for spatially correlated memoryless sources. In the second algorithm, a hidden Markov decoding process is added to the MCMP decoder to effectively exploit the data correlation in both the space and time domains. Third, we develop distortion-tolerant high mobility wireless communication systems by considering correlated channel state information (CSI) in the time domain, and study the optimum designs with imperfect CSI. The pilot-assisted channel estimation mean square error is expressed as a closed-form expression of various system parameters through asymptotic analysis. Based on the statistical properties of the channel estimation error, we quantify the impacts of imperfect CSI on system performance by developing the analytical symbol error rate and a spectral efficiency lower bound of the communication system

    Channel Estimation for LEO Satellite Massive MIMO OFDM Communications

    Full text link
    In this paper, we investigate the massive multiple-input multiple-output orthogonal frequency division multiplexing channel estimation for low-earth-orbit satellite communication systems. First, we use the angle-delay domain channel to characterize the space-frequency domain channel. Then, we show that the asymptotic minimum mean square error (MMSE) of the channel estimation can be minimized if the array response vectors of the user terminals (UTs) that use the same pilot are orthogonal. Inspired by this, we design an efficient graph-based pilot allocation strategy to enhance the channel estimation performance. In addition, we devise a novel two-stage channel estimation (TSCE) approach, in which the received signals at the satellite are manipulated with per-subcarrier space domain processing followed by per-user frequency domain processing. Moreover, the space domain processing of each UT is shown to be identical for all the subcarriers, and an asymptotically optimal vector for the per-subcarrier space domain linear processing is derived. The frequency domain processing can be efficiently implemented by means of the fast Toeplitz system solver. Simulation results show that the proposed TSCE approach can achieve a near performance to the MMSE estimation with much lower complexity.Comment: accepted by IEEE Transactions on Wireless Communication

    A Survey on Delay-Aware Resource Control for Wireless Systems --- Large Deviation Theory, Stochastic Lyapunov Drift and Distributed Stochastic Learning

    Full text link
    In this tutorial paper, a comprehensive survey is given on several major systematic approaches in dealing with delay-aware control problems, namely the equivalent rate constraint approach, the Lyapunov stability drift approach and the approximate Markov Decision Process (MDP) approach using stochastic learning. These approaches essentially embrace most of the existing literature regarding delay-aware resource control in wireless systems. They have their relative pros and cons in terms of performance, complexity and implementation issues. For each of the approaches, the problem setup, the general solution and the design methodology are discussed. Applications of these approaches to delay-aware resource allocation are illustrated with examples in single-hop wireless networks. Furthermore, recent results regarding delay-aware multi-hop routing designs in general multi-hop networks are elaborated. Finally, the delay performance of the various approaches are compared through simulations using an example of the uplink OFDMA systems.Comment: 58 pages, 8 figures; IEEE Transactions on Information Theory, 201

    EUROPEAN CONFERENCE ON QUEUEING THEORY 2016

    Get PDF
    International audienceThis booklet contains the proceedings of the second European Conference in Queueing Theory (ECQT) that was held from the 18th to the 20th of July 2016 at the engineering school ENSEEIHT, Toulouse, France. ECQT is a biannual event where scientists and technicians in queueing theory and related areas get together to promote research, encourage interaction and exchange ideas. The spirit of the conference is to be a queueing event organized from within Europe, but open to participants from all over the world. The technical program of the 2016 edition consisted of 112 presentations organized in 29 sessions covering all trends in queueing theory, including the development of the theory, methodology advances, computational aspects and applications. Another exciting feature of ECQT2016 was the institution of the Takács Award for outstanding PhD thesis on "Queueing Theory and its Applications"

    Degrees of Freedom of Time Correlated MISO Broadcast Channel with Delayed CSIT

    Full text link
    We consider the time correlated multiple-input single-output (MISO) broadcast channel where the transmitter has imperfect knowledge on the current channel state, in addition to delayed channel state information. By representing the quality of the current channel state information as P^-{\alpha} for the signal-to-noise ratio P and some constant {\alpha} \geq 0, we characterize the optimal degree of freedom region for this more general two-user MISO broadcast correlated channel. The essential ingredients of the proposed scheme lie in the quantization and multicasting of the overheard interferences, while broadcasting new private messages. Our proposed scheme smoothly bridges between the scheme recently proposed by Maddah-Ali and Tse with no current state information and a simple zero-forcing beamforming with perfect current state information.Comment: revised and final version, to appear in IEEE transactions on Information Theor
    • …
    corecore