573 research outputs found

    Transferring knowledge as heuristics in reinforcement learning: A case-based approach

    Get PDF
    The goal of this paper is to propose and analyse a transfer learning meta-algorithm that allows the implementation of distinct methods using heuristics to accelerate a Reinforcement Learning procedure in one domain (the target) that are obtained from another (simpler) domain (the source domain). This meta-algorithm works in three stages: first, it uses a Reinforcement Learning step to learn a task on the source domain, storing the knowledge thus obtained in a case base; second, it does an unsupervised mapping of the source-domain actions to the target-domain actions; and, third, the case base obtained in the first stage is used as heuristics to speed up the learning process in the target domain. A set of empirical evaluations were conducted in two target domains: the 3D mountain car (using a learned case base from a 2D simulation) and stability learning for a humanoid robot in the Robocup 3D Soccer Simulator (that uses knowledge learned from the Acrobot domain). The results attest that our transfer learning algorithm outperforms recent heuristically-accelerated reinforcement learning and transfer learning algorithms. © 2015 Elsevier B.V.Luiz Celiberto Jr. and Reinaldo Bianchi acknowledge the support of FAPESP (grants 2012/14010-5 and 2011/19280-8). Paulo E. Santos acknowledges support from FAPESP (grant 2012/04089-3) and CNPq (grant PQ2 -303331/2011-9).Peer Reviewe

    Learning and Reasoning Strategies for User Association in Ultra-dense Small Cell Vehicular Networks

    Get PDF
    Recent vehicular ad hoc networks research has been focusing on providing intelligent transportation services by employing information and communication technologies on road transport. It has been understood that advanced demands such as reliable connectivity, high user throughput, and ultra-low latency required by these services cannot be met using traditional communication technologies. Consequently, this thesis reports on the application of artificial intelligence to user association as a technology enabler in ultra-dense small cell vehicular networks. In particular, the work focuses on mitigating mobility-related concerns and networking issues at different mobility levels by employing diverse heuristic as well as reinforcement learning (RL) methods. Firstly, driven by rapid fluctuations in the network topology and the radio environment, a conventional, three-step sequence user association policy is designed to highlight and explore the impact of vehicle speed and different performance indicators on network quality of service (QoS) and user experience. Secondly, inspired by control-theoretic models and dynamic programming, a real-time controlled feedback user association approach is proposed. The algorithm adapts to the changing vehicular environment by employing derived network performance information as a heuristic, resulting in improved network performance. Thirdly, a sequence of novel RL based user association algorithms are developed that employ variable learning rate, variable rewards function and adaptation of the control feedback framework to improve the initial and steady-state learning performance. Furthermore, to accelerate the learning process and enhance the adaptability and robustness of the developed RL algorithms, heuristically accelerated RL and case-based transfer learning methods are employed. A comprehensive, two-tier, event-based, system level simulator which is an integration of a dynamic vehicular network, a highway, and an ultra-dense small cell network is developed. The model has enabled the analysis of user mobility effects on the network performance across different mobility levels as well as served as a firm foundation for the evaluation of the empirical properties of the investigated approaches

    Accelerating Reinforcement Learning for Dynamic Spectrum Access in Cognitive Wireless Networks

    Get PDF
    This thesis studies the applications of distributed reinforcement learning (RL) based machine intelligence to dynamic spectrum access (DSA) in future cognitive wireless networks. In particular, this work focuses on ways of accelerating distributed RL based DSA algorithms in order to improve their adaptability in terms of the initial and steady-state performance, and the quality of service (QoS) convergence behaviour. The performance of the DSA schemes proposed in this thesis is empirically evaluated using large-scale system-level simulations of a temporary event scenario which involves a cognitive small cell network installed in a densely populated stadium, and in some cases a base station on an aerial platform and a number of local primary LTE base stations, all sharing the same spectrum. Some of the algorithms are also theoretically evaluated using a Bayesian network based probabilistic convergence analysis method proposed by the author. The thesis presents novel distributed RL based DSA algorithms that employ a Win-or-Learn-Fast (WoLF) variable learning rate and an adaptation of the heuristically accelerated RL (HARL) framework in order to significantly improve the initial performance and the convergence speed of classical RL algorithms and, thus, increase their adaptability in challenging DSA environments. Furthermore, a distributed case-based RL approach to DSA is proposed. It combines RL and case-based reasoning to increase the robustness and adaptability of distributed RL based DSA schemes in dynamically changing wireless environments

    Robotic Test Tube Rearrangement Using Combined Reinforcement Learning and Motion Planning

    Full text link
    A combined task-level reinforcement learning and motion planning framework is proposed in this paper to address a multi-class in-rack test tube rearrangement problem. At the task level, the framework uses reinforcement learning to infer a sequence of swap actions while ignoring robotic motion details. At the motion level, the framework accepts the swapping action sequences inferred by task-level agents and plans the detailed robotic pick-and-place motion. The task and motion-level planning form a closed loop with the help of a condition set maintained for each rack slot, which allows the framework to perform replanning and effectively find solutions in the presence of low-level failures. Particularly for reinforcement learning, the framework leverages a distributed deep Q-learning structure with the Dueling Double Deep Q Network (D3QN) to acquire near-optimal policies and uses an A{}^\star-based post-processing technique to amplify the collected training data. The D3QN and distributed learning help increase training efficiency. The post-processing helps complete unfinished action sequences and remove redundancy, thus making the training data more effective. We carry out both simulations and real-world studies to understand the performance of the proposed framework. The results verify the performance of the RL and post-processing and show that the closed-loop combination improves robustness. The framework is ready to incorporate various sensory feedback. The real-world studies also demonstrated the incorporation

    Aerial base stations with opportunistic links for next generation emergency communications

    Get PDF
    Rapidly deployable and reliable mission-critical communication networks are fundamental requirements to guarantee the successful operations of public safety officers during disaster recovery and crisis management preparedness. The ABSOLUTE project focused on designing, prototyping, and demonstrating a high-capacity IP mobile data network with low latency and large coverage suitable for many forms of multimedia delivery including public safety scenarios. The ABSOLUTE project combines aerial, terrestrial, and satellites communication networks for providing a robust standalone system able to deliver resilience communication systems. This article focuses on describing the main outcomes of the ABSOLUTE project in terms of network and system architecture, regulations, and implementation of aerial base stations, portable land mobile units, satellite backhauling, S-MIM satellite messaging, and multimode user equipments

    Energy sustainability of next generation cellular networks through learning techniques

    Get PDF
    The trend for the next generation of cellular network, the Fifth Generation (5G), predicts a 1000x increase in the capacity demand with respect to 4G, which leads to new infrastructure deployments. To this respect, it is estimated that the energy consumption of ICT might reach the 51% of global electricity production by 2030, mainly due to mobile networks and services. Consequently, the cost of energy may also become predominant in the operative expenses of a Mobile Network Operator (MNO). Therefore, an efficient control of the energy consumption in 5G networks is not only desirable but essential. In fact, the energy sustainability is one of the pillars in the design of the next generation cellular networks. In the last decade, the research community has been paying close attention to the Energy Efficiency (EE) of the radio communication networks, with particular care on the dynamic switch ON/OFF of the Base Stations (BSs). Besides, 5G architectures will introduce the Heterogeneous Network (HetNet) paradigm, where Small BSs (SBSs) are deployed to assist the standard macro BS for satisfying the high traffic demand and reducing the impact on the energy consumption. However, only with the introduction of Energy Harvesting (EH) capabilities the networks might reach the needed energy savings for mitigating both the high costs and the environmental impact. In the case of HetNets with EH capabilities, the erratic and intermittent nature of renewable energy sources has to be considered, which entails some additional complexity. Solar energy has been chosen as reference EH source due to its widespread adoption and its high efficiency in terms of energy produced compared to its costs. To this end, in the first part of the thesis, a harvested solar energy model has been presented based on accurate stochastic Markov processes for the description of the energy scavenged by outdoor solar sources. The typical HetNet scenario involves dense deployments with a high level of flexibility, which suggests the usage of distributed control systems rather than centralized, where the scalability can become rapidly a bottleneck. For this reason, in the second part of the thesis, we propose to model the SBS tier as a Multi-agent Reinforcement Learning (MRL) system, where each SBS is an intelligent and autonomous agent, which learns by directly interacting with the environment and by properly utilizing the past experience. The agents implemented in each SBS independently learn a proper switch ON/OFF control policy, so as to jointly maximize the system performance in terms of throughput, drop rate and energy consumption, while adapting to the dynamic conditions of the environment, in terms of energy inflow and traffic demand. However, MRL might suffer the problem of coordination when finding simultaneously a solution among all the agents that is good for the whole system. In consequence, the Layered Learning paradigm has been adopted to simplify the problem by decomposing it in subtasks. In particular, the global solution is obtained in a hierarchical fashion: the learning process of a subtask is aimed at facilitating the learning of the next higher subtask layer. The first layer implements an MRL approach and it is in charge of the local online optimization at SBS level as function of the traffic demand and the energy incomes. The second layer is in charge of the network-wide optimization and it is based on Artificial Neural Networks aimed at estimating the model of the overall network.Con la llegada de la nueva generación de redes móviles, la quinta generación (5G), se predice un aumento por un factor 1000 en la demanda de capacidad respecto a la 4G, con la consecuente instalación de nuevas infraestructuras. Se estima que el gasto energético de las tecnologías de la información y la comunicación podría alcanzar el 51% de la producción mundial de energía en el año 2030, principalmente debido al impacto de las redes y servicios móviles. Consecuentemente, los costes relacionados con el consumo de energía pasarán a ser una componente predominante en los gastos operativos (OPEX) de las operadoras de redes móviles. Por lo tanto, un control eficiente del consumo energético de las redes 5G, ya no es simplemente deseable, sino esencial. En la última década, la comunidad científica ha enfocado sus esfuerzos en la eficiencia energética (EE) de las redes de comunicaciones móviles, con particular énfasis en algoritmos para apagar y encender las estaciones base (BS). Además, las arquitecturas 5G introducirán el paradigma de las redes heterogéneas (HetNet), donde pequeñas BSs, o small BSs (SBSs), serán desplegadas para ayudar a las grandes macro BSs en satisfacer la gran demanda de tráfico y reducir el impacto en el consumo energético. Sin embargo, solo con la introducción de técnicas de captación de la energía ambiental, las redes pueden alcanzar los ahorros energéticos requeridos para mitigar los altos costes de la energía y su impacto en el medio ambiente. En el caso de las HetNets alimentadas mediante energías renovables, la naturaleza errática e intermitente de esta tipología de energías constituye una complejidad añadida al problema. La energía solar ha sido utilizada como referencia debido a su gran implantación y su alta eficiencia en términos de cantidad de energía producida respecto costes de producción. Por consiguiente, en la primera parte de la tesis se presenta un modelo de captación de la energía solar basado en un riguroso modelo estocástico de Markov que representa la energía capturada por paneles solares para exteriores. El escenario típico de HetNet supondrá el despliegue denso de SBSs con un alto nivel de flexibilidad, lo cual sugiere la utilización de sistemas de control distribuidos en lugar de aquellos que están centralizados, donde la adaptabilidad podría convertirse rápidamente en un reto difícilmente gestionable. Por esta razón, en la segunda parte de la tesis proponemos modelar las SBSs como un sistema multiagente de aprendizaje automático por refuerzo, donde cada SBS es un agente inteligente y autónomo que aprende interactuando directamente con su entorno y utilizando su experiencia acumulada. Los agentes en cada SBS aprenden independientemente políticas de control del apagado y encendido que les permiten maximizar conjuntamente el rendimiento y el consumo energético a nivel de sistema, adaptándose a condiciones dinámicas del ambiente tales como la energía renovable entrante y la demanda de tráfico. No obstante, los sistemas multiagente sufren problemas de coordinación cuando tienen que hallar simultáneamente una solución de forma distribuida que sea buena para todo el sistema. A tal efecto, el paradigma de aprendizaje por niveles ha sido utilizado para simplificar el problema dividiéndolo en subtareas. Más detalladamente, la solución global se consigue de forma jerárquica: el proceso de aprendizaje de una subtarea está dirigido a ayudar al aprendizaje de la subtarea del nivel superior. El primer nivel contempla un sistema multiagente de aprendizaje automático por refuerzo y se encarga de la optimización en línea de las SBSs en función de la demanda de tráfico y de la energía entrante. El segundo nivel se encarga de la optimización a nivel de red del sistema y está basado en redes neuronales artificiales diseñadas para estimar el modelo de todas las BSsPostprint (published version
    corecore