1,121 research outputs found

    Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks

    Get PDF
    An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet-of-Things (IoT) users, by optimizing offloading decision, transmission power, and resource allocation in the large-scale mobile-edge computing (MEC) system. Toward this end, a deep reinforcement learning (DRL)-based solution is proposed, which includes the following components. First, a related and regularized stacked autoencoder (2r-SAE) with unsupervised learning is applied to perform data compression and representation for high-dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Second, we present an adaptive simulated annealing approach (ASA) as the action search method of DRL, in which an adaptive h -mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Third, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. The numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks

    Generic Online Learning for Partial Visible & Dynamic Environment with Delayed Feedback

    Get PDF
    Reinforcement learning (RL) has been applied to robotics and many other domains which a system must learn in real-time and interact with a dynamic environment. In most studies the state- action space that is the key part of RL is predefined. Integration of RL with deep learning method has however taken a tremendous leap forward to solve novel challenging problems such as mastering a board game of Go. The surrounding environment to the agent may not be fully visible, the environment can change over time, and the feedbacks that agent receives for its actions can have a fluctuating delay. In this paper, we propose a Generic Online Learning (GOL) system for such environments. GOL is based on RL with a hierarchical structure to form abstract features in time and adapt to the optimal solutions. The proposed method has been applied to load balancing in 5G cloud random access networks. Simulation results show that GOL successfully achieves the system objectives of reducing cache-misses and communication load, while incurring only limited system overhead in terms of number of high-level patterns needed. We believe that the proposed GOL architecture is significant for future online learning of dynamic, partially visible environments, and would be very useful for many autonomous control systems

    A Survey of Adaptive Resonance Theory Neural Network Models for Engineering Applications

    Full text link
    This survey samples from the ever-growing family of adaptive resonance theory (ART) neural network models used to perform the three primary machine learning modalities, namely, unsupervised, supervised and reinforcement learning. It comprises a representative list from classic to modern ART models, thereby painting a general picture of the architectures developed by researchers over the past 30 years. The learning dynamics of these ART models are briefly described, and their distinctive characteristics such as code representation, long-term memory and corresponding geometric interpretation are discussed. Useful engineering properties of ART (speed, configurability, explainability, parallelization and hardware implementation) are examined along with current challenges. Finally, a compilation of online software libraries is provided. It is expected that this overview will be helpful to new and seasoned ART researchers

    Integrating Symbolic and Neural Processing in a Self-Organizing Architechture for Pattern Recognition and Prediction

    Full text link
    British Petroleum (89A-1204); Defense Advanced Research Projects Agency (N00014-92-J-4015); National Science Foundation (IRI-90-00530); Office of Naval Research (N00014-91-J-4100); Air Force Office of Scientific Research (F49620-92-J-0225

    Artificial cognitive architecture with self-learning and self-optimization capabilities. Case studies in micromachining processes

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura : 22-09-201

    Deep reinforcement learning approach for MPPT control of partially shaded PV systems in Smart Grids

    Get PDF
    Photovoltaic systems (PV) are having an increased importance in modern smart grids systems. Usually, in order to maximize the energy output of the PV arrays a maximum power point tracking (MPPT) algorithm is used. However, once deployed, weather conditions such as clouds can cause shades in the PV arrays affecting the dynamics of each panel differently. These conditions directly affect the available energy output of the arrays and in turn make the MPPT task extremely difficult. For these reasons, under partial shading conditions, it is necessary to have algorithms that are able to learn and adapt online to the changing state of the system. In this work we propose the use of deep reinforcement learning (DRL) techniques to address the MPPT problem of a PV array under partial shading conditions. We develop a model free RL algorithm to maximize the efficiency in MPPT control. The agent's policy is parameterized by neural networks, which take the sensory information as input and directly output the control signal. Furthermore, a PV environment under shading conditions was developed in the open source OpenAI Gym platform and is made available in an open repository. Several tests are performed, using the developed simulated environment, to test the robustness of the proposed control strategies to different climate conditions. The obtained results show the feasibility of our proposal with a successful performance with fast responses and stable behaviors. The best results for the presented methodology show that the maximum operating power point achieved has a deviation less than 1% compared to the theoretical maximum power point.Fil: Avila, Luis Omar. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Departamento de Informática. Laboratorio Investigación y Desarrollo en Inteligencia Computacional; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: de Paula, Mariano. Universidad Nacional del Centro de la Provincia de Buenos Aires. Centro de Investigaciones en Física e Ingeniería del Centro de la Provincia de Buenos Aires. - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Centro de Investigaciones en Física e Ingeniería del Centro de la Provincia de Buenos Aires. - Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Centro de Investigaciones en Física e Ingeniería del Centro de la Provincia de Buenos Aires; ArgentinaFil: Trimboli, Maximiliano Daniel. Universidad Nacional de San Luis. Facultad de Ingeniería y Ciencias Agropecuarias. Laboratorio de Control Automático; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Carlucho, Ignacio. State University of Louisiana; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin
    • …
    corecore