12,839 research outputs found

    Q-learning with Nearest Neighbors

    Full text link
    We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a dd-dimensional state space and the discounted factor γ∈(0,1)\gamma \in (0,1), given an arbitrary sample path with "covering time" L L , we establish that the algorithm is guaranteed to output an ε\varepsilon-accurate estimate of the optimal Q-function using O~(L/(ε3(1−γ)7))\tilde{O}\big(L/(\varepsilon^3(1-\gamma)^7)\big) samples. For instance, for a well-behaved MDP, the covering time of the sample path under the purely random policy scales as O~(1/εd), \tilde{O}\big(1/\varepsilon^d\big), so the sample complexity scales as O~(1/εd+3).\tilde{O}\big(1/\varepsilon^{d+3}\big). Indeed, we establish a lower bound that argues that the dependence of Ω~(1/εd+2) \tilde{\Omega}\big(1/\varepsilon^{d+2}\big) is necessary.Comment: Accepted to NIPS 201

    Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

    Full text link
    Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig
    • …
    corecore