379 research outputs found

    Transferring knowledge as heuristics in reinforcement learning: A case-based approach

    Get PDF
    The goal of this paper is to propose and analyse a transfer learning meta-algorithm that allows the implementation of distinct methods using heuristics to accelerate a Reinforcement Learning procedure in one domain (the target) that are obtained from another (simpler) domain (the source domain). This meta-algorithm works in three stages: first, it uses a Reinforcement Learning step to learn a task on the source domain, storing the knowledge thus obtained in a case base; second, it does an unsupervised mapping of the source-domain actions to the target-domain actions; and, third, the case base obtained in the first stage is used as heuristics to speed up the learning process in the target domain. A set of empirical evaluations were conducted in two target domains: the 3D mountain car (using a learned case base from a 2D simulation) and stability learning for a humanoid robot in the Robocup 3D Soccer Simulator (that uses knowledge learned from the Acrobot domain). The results attest that our transfer learning algorithm outperforms recent heuristically-accelerated reinforcement learning and transfer learning algorithms. © 2015 Elsevier B.V.Luiz Celiberto Jr. and Reinaldo Bianchi acknowledge the support of FAPESP (grants 2012/14010-5 and 2011/19280-8). Paulo E. Santos acknowledges support from FAPESP (grant 2012/04089-3) and CNPq (grant PQ2 -303331/2011-9).Peer Reviewe

    Accelerating Reinforcement Learning for Dynamic Spectrum Access in Cognitive Wireless Networks

    Get PDF
    This thesis studies the applications of distributed reinforcement learning (RL) based machine intelligence to dynamic spectrum access (DSA) in future cognitive wireless networks. In particular, this work focuses on ways of accelerating distributed RL based DSA algorithms in order to improve their adaptability in terms of the initial and steady-state performance, and the quality of service (QoS) convergence behaviour. The performance of the DSA schemes proposed in this thesis is empirically evaluated using large-scale system-level simulations of a temporary event scenario which involves a cognitive small cell network installed in a densely populated stadium, and in some cases a base station on an aerial platform and a number of local primary LTE base stations, all sharing the same spectrum. Some of the algorithms are also theoretically evaluated using a Bayesian network based probabilistic convergence analysis method proposed by the author. The thesis presents novel distributed RL based DSA algorithms that employ a Win-or-Learn-Fast (WoLF) variable learning rate and an adaptation of the heuristically accelerated RL (HARL) framework in order to significantly improve the initial performance and the convergence speed of classical RL algorithms and, thus, increase their adaptability in challenging DSA environments. Furthermore, a distributed case-based RL approach to DSA is proposed. It combines RL and case-based reasoning to increase the robustness and adaptability of distributed RL based DSA schemes in dynamically changing wireless environments

    Energy sustainability of next generation cellular networks through learning techniques

    Get PDF
    The trend for the next generation of cellular network, the Fifth Generation (5G), predicts a 1000x increase in the capacity demand with respect to 4G, which leads to new infrastructure deployments. To this respect, it is estimated that the energy consumption of ICT might reach the 51% of global electricity production by 2030, mainly due to mobile networks and services. Consequently, the cost of energy may also become predominant in the operative expenses of a Mobile Network Operator (MNO). Therefore, an efficient control of the energy consumption in 5G networks is not only desirable but essential. In fact, the energy sustainability is one of the pillars in the design of the next generation cellular networks. In the last decade, the research community has been paying close attention to the Energy Efficiency (EE) of the radio communication networks, with particular care on the dynamic switch ON/OFF of the Base Stations (BSs). Besides, 5G architectures will introduce the Heterogeneous Network (HetNet) paradigm, where Small BSs (SBSs) are deployed to assist the standard macro BS for satisfying the high traffic demand and reducing the impact on the energy consumption. However, only with the introduction of Energy Harvesting (EH) capabilities the networks might reach the needed energy savings for mitigating both the high costs and the environmental impact. In the case of HetNets with EH capabilities, the erratic and intermittent nature of renewable energy sources has to be considered, which entails some additional complexity. Solar energy has been chosen as reference EH source due to its widespread adoption and its high efficiency in terms of energy produced compared to its costs. To this end, in the first part of the thesis, a harvested solar energy model has been presented based on accurate stochastic Markov processes for the description of the energy scavenged by outdoor solar sources. The typical HetNet scenario involves dense deployments with a high level of flexibility, which suggests the usage of distributed control systems rather than centralized, where the scalability can become rapidly a bottleneck. For this reason, in the second part of the thesis, we propose to model the SBS tier as a Multi-agent Reinforcement Learning (MRL) system, where each SBS is an intelligent and autonomous agent, which learns by directly interacting with the environment and by properly utilizing the past experience. The agents implemented in each SBS independently learn a proper switch ON/OFF control policy, so as to jointly maximize the system performance in terms of throughput, drop rate and energy consumption, while adapting to the dynamic conditions of the environment, in terms of energy inflow and traffic demand. However, MRL might suffer the problem of coordination when finding simultaneously a solution among all the agents that is good for the whole system. In consequence, the Layered Learning paradigm has been adopted to simplify the problem by decomposing it in subtasks. In particular, the global solution is obtained in a hierarchical fashion: the learning process of a subtask is aimed at facilitating the learning of the next higher subtask layer. The first layer implements an MRL approach and it is in charge of the local online optimization at SBS level as function of the traffic demand and the energy incomes. The second layer is in charge of the network-wide optimization and it is based on Artificial Neural Networks aimed at estimating the model of the overall network.Con la llegada de la nueva generación de redes móviles, la quinta generación (5G), se predice un aumento por un factor 1000 en la demanda de capacidad respecto a la 4G, con la consecuente instalación de nuevas infraestructuras. Se estima que el gasto energético de las tecnologías de la información y la comunicación podría alcanzar el 51% de la producción mundial de energía en el año 2030, principalmente debido al impacto de las redes y servicios móviles. Consecuentemente, los costes relacionados con el consumo de energía pasarán a ser una componente predominante en los gastos operativos (OPEX) de las operadoras de redes móviles. Por lo tanto, un control eficiente del consumo energético de las redes 5G, ya no es simplemente deseable, sino esencial. En la última década, la comunidad científica ha enfocado sus esfuerzos en la eficiencia energética (EE) de las redes de comunicaciones móviles, con particular énfasis en algoritmos para apagar y encender las estaciones base (BS). Además, las arquitecturas 5G introducirán el paradigma de las redes heterogéneas (HetNet), donde pequeñas BSs, o small BSs (SBSs), serán desplegadas para ayudar a las grandes macro BSs en satisfacer la gran demanda de tráfico y reducir el impacto en el consumo energético. Sin embargo, solo con la introducción de técnicas de captación de la energía ambiental, las redes pueden alcanzar los ahorros energéticos requeridos para mitigar los altos costes de la energía y su impacto en el medio ambiente. En el caso de las HetNets alimentadas mediante energías renovables, la naturaleza errática e intermitente de esta tipología de energías constituye una complejidad añadida al problema. La energía solar ha sido utilizada como referencia debido a su gran implantación y su alta eficiencia en términos de cantidad de energía producida respecto costes de producción. Por consiguiente, en la primera parte de la tesis se presenta un modelo de captación de la energía solar basado en un riguroso modelo estocástico de Markov que representa la energía capturada por paneles solares para exteriores. El escenario típico de HetNet supondrá el despliegue denso de SBSs con un alto nivel de flexibilidad, lo cual sugiere la utilización de sistemas de control distribuidos en lugar de aquellos que están centralizados, donde la adaptabilidad podría convertirse rápidamente en un reto difícilmente gestionable. Por esta razón, en la segunda parte de la tesis proponemos modelar las SBSs como un sistema multiagente de aprendizaje automático por refuerzo, donde cada SBS es un agente inteligente y autónomo que aprende interactuando directamente con su entorno y utilizando su experiencia acumulada. Los agentes en cada SBS aprenden independientemente políticas de control del apagado y encendido que les permiten maximizar conjuntamente el rendimiento y el consumo energético a nivel de sistema, adaptándose a condiciones dinámicas del ambiente tales como la energía renovable entrante y la demanda de tráfico. No obstante, los sistemas multiagente sufren problemas de coordinación cuando tienen que hallar simultáneamente una solución de forma distribuida que sea buena para todo el sistema. A tal efecto, el paradigma de aprendizaje por niveles ha sido utilizado para simplificar el problema dividiéndolo en subtareas. Más detalladamente, la solución global se consigue de forma jerárquica: el proceso de aprendizaje de una subtarea está dirigido a ayudar al aprendizaje de la subtarea del nivel superior. El primer nivel contempla un sistema multiagente de aprendizaje automático por refuerzo y se encarga de la optimización en línea de las SBSs en función de la demanda de tráfico y de la energía entrante. El segundo nivel se encarga de la optimización a nivel de red del sistema y está basado en redes neuronales artificiales diseñadas para estimar el modelo de todas las BSsPostprint (published version

    Development of cooperative behavioural model for autonomous multi-robots system deployed to underground mines

    Get PDF
    The number of disasters that occur in underground mine environments monthly all over the world cannot be ignored. Some of these disasters for instance are roof-falls; explosions, toxic gas inhalation, in-mine vehicle accidents, etc. can cause fatalities and/or disabilities. However, when such accidents happen during mining operations, rescuers find it difficult to respond to it immediately. This creates the necessity to bridge the gap between the lives of miners and the product acquired from the underground mines by using multi-robot systems. This thesis proposes an autonomous multi-robot cooperative behavioural model that can help to guide multi-robots in pre-entry safety inspection of underground mines. A hybrid swarm intelligent model termed, QLACS, that is based on Q-Learning (QL) and the Ant Colony System (ACS) is proposed to achieve cooperative behaviour in a MRS. The intelligent model was developed by harnessing the strengths of both QL and ACS algorithms. The ACS is used to optimize the routes used for each robot while the QL algorithm is used to enhance cooperation among the autonomous robots. The communication within the QLACS model for cooperative behavioural purposes is varied. The performance of the algorithms in terms of communication was evaluated by using a simulation approach. An investigation is conducted on the evaluation/scalability of the model using the different numbers of robots. Simulation results show that the methods proposed in this thesis achieved cooperative behaviour among the robots better than state-of-the-art or other common approaches. Using time and memory consumption as performance metrics, the results reveal that the proposed model can guide two, three and up to four robots to achieve efficient cooperative inspection behaviour in underground terrains

    Neural Approximate Dynamic Programming for the Ultra-fast Order Dispatching Problem

    Full text link
    Same-Day Delivery (SDD) services aim to maximize the fulfillment of online orders while minimizing delivery delays but are beset by operational uncertainties such as those in order volumes and courier planning. Our work aims to enhance the operational efficiency of SDD by focusing on the ultra-fast Order Dispatching Problem (ODP), which involves matching and dispatching orders to couriers within a centralized warehouse setting, and completing the delivery within a strict timeline (e.g., within minutes). We introduce important extensions to ultra-fast ODP such as order batching and explicit courier assignments to provide a more realistic representation of dispatching operations and improve delivery efficiency. As a solution method, we primarily focus on NeurADP, a methodology that combines Approximate Dynamic Programming (ADP) and Deep Reinforcement Learning (DRL), and our work constitutes the first application of NeurADP outside of the ride-pool matching problem. NeurADP is particularly suitable for ultra-fast ODP as it addresses complex one-to-many matching and routing intricacies through a neural network-based VFA that captures high-dimensional problem dynamics without requiring manual feature engineering as in generic ADP methods. We test our proposed approach using four distinct realistic datasets tailored for ODP and compare the performance of NeurADP against myopic and DRL baselines by also making use of non-trivial bounds to assess the quality of the policies. Our numerical results indicate that the inclusion of order batching and courier queues enhances the efficiency of delivery operations and that NeurADP significantly outperforms other methods. Detailed sensitivity analysis with important parameters confirms the robustness of NeurADP under different scenarios, including variations in courier numbers, spatial setup, vehicle capacity, and permitted delay time
    corecore