489 research outputs found

    Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics

    Full text link
    We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a player chooses M out of N arms to play at each time. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. The performance of an arm selection policy is measured by regret, defined as the reward loss with respect to the case where the player knows which M arms are the most rewarding and always plays the M best arms. We construct a policy with an interleaving exploration and exploitation epoch structure that achieves a regret with logarithmic order when arbitrary (but nontrivial) bounds on certain system parameters are known. When no knowledge about the system is available, we show that the proposed policy achieves a regret arbitrarily close to the logarithmic order. We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange. Under both an exogenous restless model and an endogenous restless model, we show that a decentralized extension of the proposed policy preserves the logarithmic regret order as in the centralized setting. The results apply to adaptive learning in various dynamic systems and communication networks, as well as financial investment.Comment: 33 pages, 5 figures, submitted to IEEE Transactions on Information Theory, 201

    Discrete time analysis of cognitive radio networks with imperfect sensing and saturated source of secondary users, Computer Communications

    Get PDF
    Sensing is one of the most challenging issues in cognitive radio networks. Selection of sensing parameters raises several tradeoffs between spectral efficiency, energy efficiency and interference caused to primary users (PUs). In this paper we provide representative mathematical models that can be used to analyze sensing strategies under a wide range of conditions. The activity of PUs in a licensed channel is modeled as a sequence of busy and idle periods, which is represented as an alternating Markov phase renewal process. The representation of the secondary users (SUs) behavior is also largely general: the duration of transmissions, sensing periods and the intervals between consecutive sensing periods are modeled by phase type distributions, which constitute a very versatile class of distributions. Expressions for several key performance measures in cognitive radio networks are obtained from the analysis of the model. Most notably, we derive the distribution of the length of an effective white space; the distributions of the waiting times until the SU transmits a given amount of data, through several transmission epochs uninterruptedly; and the goodput when an interrupted SU transmission has to be restarted from the beginning due to the presence of a PU. (C) 2015 Elsevier B.V. All rights reserved.The research of A. S. Alfa was partially supported by the NSERC (Natural Sciences and Engineering Research Council) of Canada under Grant G00315156. Most of the contribution of V. Pla was done while visiting the University of Manitoba. This visit was supported by the Ministerio de Educacion of Spain under Grant PR2011-0055, and by the UPV through the Programa de Apoyo a la Investigacion y Desarrollo (PAID-00-12). The research of the authors from the Universitat Politecnica de Valencia was partially supported by the Ministry of Economy and Competitiveness of Spain under Grant TIN2013-47272-C2-1-R.Alfa, AS.; Pla, V.; Martínez Bauset, J.; Casares Giner, V. (2016). Discrete time analysis of cognitive radio networks with imperfect sensing and saturated source of secondary users, Computer Communications. Computer Communications. 79:53-65. https://doi.org/10.1016/j.comcom.2015.11.012S53657

    Spectrum sensing and occupancy prediction for cognitive machine-to-machine wireless networks

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfil ment of the requirements for the degree of Doctor of Philosophy (PhD)The rapid growth of the Internet of Things (IoT) introduces an additional challenge to the existing spectrum under-utilisation problem as large scale deployments of thousands devices are expected to require wireless connectivity. Dynamic Spectrum Access (DSA) has been proposed as a means of improving the spectrum utilisation of wireless systems. Based on the Cognitive Radio (CR) paradigm, DSA enables unlicensed spectrum users to sense their spectral environment and adapt their operational parameters to opportunistically access any temporally unoccupied bands without causing interference to the primary spectrum users. In the same context, CR inspired Machine-to-Machine (M2M) communications have recently been proposed as a potential solution to the spectrum utilisation problem, which has been driven by the ever increasing number of interconnected devices. M2M communications introduce new challenges for CR in terms of operational environments and design requirements. With spectrum sensing being the key function for CR, this thesis investigates the performance of spectrum sensing and proposes novel sensing approaches and models to address the sensing problem for cognitive M2M deployments. In this thesis, the behaviour of Energy Detection (ED) spectrum sensing for cognitive M2M nodes is modelled using the two-wave with dffi use power fading model. This channel model can describe a variety of realistic fading conditions including worse than Rayleigh scenarios that are expected to occur within the operational environments of cognitive M2M communication systems. The results suggest that ED based spectrum sensing fails to meet the sensing requirements over worse than Rayleigh conditions and consequently requires the signal-to-noise ratio (SNR) to be increased by up to 137%. However, by employing appropriate diversity and node cooperation techniques, the sensing performance can be improved by up to 11.5dB in terms of the required SNR. These results are particularly useful in analysing the eff ects of severe fading in cognitive M2M systems and thus they can be used to design effi cient CR transceivers and to quantify the trade-o s between detection performance and energy e fficiency. A novel predictive spectrum sensing scheme that exploits historical data of past sensing events to predict channel occupancy is proposed and analysed. This approach allows CR terminals to sense only the channels that are predicted to be unoccupied rather than the whole band of interest. Based on this approach, a spectrum occupancy predictor is developed and experimentally validated. The proposed scheme achieves a prediction accuracy of up to 93% which in turn can lead to up to 84% reduction of the spectrum sensing cost. Furthermore, a novel probabilistic model for describing the channel availability in both the vertical and horizontal polarisations is developed. The proposed model is validated based on a measurement campaign for operational scenarios where CR terminals may change their polarisation during their operation. A Gaussian approximation is used to model the empirical channel availability data with more than 95% confi dence bounds. The proposed model can be used as a means of improving spectrum sensing performance by using statistical knowledge on the primary users occupancy pattern
    corecore