14 research outputs found

    Optimal Inventory Management in a Fluctuating Market

    Get PDF

    Modular self-organization

    Get PDF
    The aim of this paper is to provide a sound framework for addressing a difficult problem: the automatic construction of an autonomous agent's modular architecture. We combine results from two apparently uncorrelated domains: Autonomous planning through Markov Decision Processes and a General Data Clustering Approach using a kernel-like method. Our fundamental idea is that the former is a good framework for addressing autonomy whereas the latter allows to tackle self-organizing problems

    Deep active localization

    Full text link
    Des progrès considérables ont été réalisés en robotique mobile au cours des dernières décennies et ces robots sont maintenant capables d’effectuer des tâches qu’on croyait au- paravant impossibles. Un facteur critique qui a permis aux robots d’accomplir ces diverses tâches difficiles est leur capacité à déterminer où ils se trouvent dans un environnement donné (localisation). On parvient à une automatisation plus poussée en laissant le robot choisir ses propres actions au lieu de faire appel à un téléopérateur humain. Cependant, la détermination précise de la pose (position + orientation) du robot et l’adaptation de cette capacité à des environnements plus vastes constituent depuis longtemps un défi dans le do- maine de la robotique mobile. Les approches traditionnelles à cette tâche de " localisation active " utilisent un critère théorique de l’information pour la sélection des actions ainsi que des modèles perceptuels faits à la main. Avec une augmentation constante des capacités de calcul disponibles au cours des trois dernières décennies, l’algorithme back-propagation a trouvé son utilisation dans des réseaux neuronaux beaucoup plus profonds et dans de nombreuses applications. En l’absence de données labellisées, le paradigme de l’apprentissage par le renforcement (RL) a récemment connu beaucoup de succès en ce qu’il apprend en interagissant avec l’environnement. Cepen- dant, il n’est pas pratique pour un algorithme RL d’apprendre raisonnablement bien à partir de l’expérience limitée du monde réel. C’est pourquoi il est courant d’entraîner l’agent dans un simulateur puis de transférer efficacement l’apprentissage dans de vrais robots. Dans cette thèse, nous proposons une méthode différentiable de bout en bout afin d’ap- prendre à choisir des mesures informatives pour la localisation de robots, qui peut être entraînée entièrement en simulation et ensuite transférée sur le robot réel sans aucun ajus- tement. Pour ce faire, on s’appuie sur les progrès récents de l’apprentissage profond et des paradigmes d’apprentissage de renforcement, combinés aux techniques de randomisation des domaine. Le système est composé de deux modules d’apprentissage : un réseau neuronal convolutionnel pour la perception, et un module de planification utilisant l’apprentissage profond par renforcement. Nous utilisons une approche multi-échelles dans le modèle per- ceptuel puisque la sélection d’action à l’aide de l’apprentissage par renforcement nécessite une précision de la position inférieure à la précision nécessaire au contrôle du robot. Nous démontrons que le système résultant surpasse les approches traditionnelles, en termes de perception et de planification. Nous démontrons également la robustesse de notre approche vis-à-vis différentes configurations de cartes et d’autres facteurs de nuisance par l’utilisa- tion de la randomisation de domaine au cours de l’entraînement. Le code a été publié : https://github.com/montrealrobotics/dal et est compatible avec le framework OpenAI gym, ainsi qu’avec le simulateur Gazebo.Mobile robots have made significant advances in recent decades and are now able to perform tasks that were once thought to be impossible. One critical factor that has enabled robots to perform these various challenging tasks is their ability to determine where they are located in a given environment (localization). Further automation is achieved by letting the robot choose its own actions instead of a human teleoperating it. However, determining its pose (position + orientation) precisely and scaling this capability to larger environments has been a long-standing challenge in the field of mobile robotics. Traditional approaches to this task of active localization use an information-theoretic criterion for action selection and hand-crafted perceptual models. With a steady rise in available computation in the last three decades, the back-propagation algorithm found its use in much deeper neural networks and in numerous applications. When labelled data is not available, the paradigm of reinforcement learning (RL) is used, where it learns by interacting with the environment. However, it is impractical for most RL algorithms to learn reasonably well from just the limited real world experience. Hence, it is common practice to train the RL based models in a simulator and efficiently transfer (without any significant loss of performance) these trained models into real robots. In this thesis, we propose an end-to-end differentiable method for learning to take in- formative actions for robot localization that is trainable entirely in simulation and then transferable onto real robot hardware with zero refinement. This is achieved by leveraging recent advancements in deep learning and reinforcement learning combined with domain randomization techniques. The system is composed of two learned modules: a convolu- tional neural network for perception, and a deep reinforcement learned planning module. We leverage a multi-scale approach in the perceptual model since the accuracy needed to take actions using reinforcement learning is much less than the accuracy needed for robot control. We demonstrate that the resulting system outperforms traditional approaches for either perception or planning. We also demonstrate our approach’s robustness to different map configurations and other nuisance parameters through the use of domain randomization in training. The code has been released: https://github.com/montrealrobotics/dal and is compatible with the OpenAI gym framework, as well as the Gazebo simulator

    Reinforcement learning for power scheduling in a grid-tied pv-battery electric vehicles charging station

    Get PDF
    Grid-tied renewable energy sources (RES) based electric vehicle (EV) charging stations are an example of a distributed generator behind the meter system (DGBMS) which characterizes most modern power infrastructure. To perform power scheduling in such a DGBMS, stochastic variables such as load profile of the charging station, output profile of the RES and tariff profile of the utility must be considered at every decision step. The stochasticity in this kind of optimization environment makes power scheduling a challenging task that deserves substantial research attention. This dissertation investigates the application of reinforcement learning (RL) techniques in solving the power scheduling problem in a grid-tied PV-powered EV charging station with the incorporation of a battery energy storage system. RL is a reward-motivated optimization technique that was derived from the way animals learn to optimize their behavior in a new environment. Unlike other optimization methods such as numerical and soft computing techniques, RL does not require an accurate model of the optimization environment in order to arrive at an optimal solution. This study developed and evaluated the feasibility of two RL algorithms, namely, an asynchronous Q-learning algorithm and an advantage actor-critic (A2C) algorithm, in performing power scheduling in the EV charging station under static conditions. To assess the performances of the proposed algorithms, the conventional Q-learning and actor-critic algorithm were implemented to compare their global cost convergence and their learning characteristics. First, the power scheduling problem was expressed as a sequential decision-making process. Then an asynchronous Q-learning algorithm was developed to solve it. Further, an advantage actor-critic (A2C) algorithm was developed and was used to solve the power scheduling problem. The two algorithms were tested using a 24-hour load, generation and utility grid tariff profiles under static optimization conditions. The performance of the asynchronous Q-learning algorithm was compared with that of the conventional Q-learning method in terms of the global cost, stability and scalability. Likewise, the A2C was compared with the conventional actor-critic method in terms of stability, scalability and convergence. Simulation results showed that both the developed algorithms (asynchronous Q-learning algorithm and A2C) converged to lower global costs and displayed more stable learning characteristics than their conventional counterparts. This research established that proper restriction of the action-space of a Q-learning algorithm improves its stability and convergence. It was also observed that such a restriction may come with compromise of computational speed and scalability. Of the four algorithms analyzed, the A2C was found to produce a power schedule with the lowest global cost and the best usage of the battery energy storage system

    Optimal energy management for a grid-tied solar PV-battery microgrid: A reinforcement learning approach

    Get PDF
    There has been a shift towards energy sustainability in recent years, and this shift should continue. The steady growth of energy demand because of population growth, as well as heightened worries about the number of anthropogenic gases released into the atmosphere and deployment of advanced grid technologies, has spurred the penetration of renewable energy resources (RERs) at different locations and scales in the power grid. As a result, the energy system is moving away from the centralized paradigm of large, controllable power plants and toward a decentralized network based on renewables. Microgrids, either grid-connected or islanded, provide a key solution for integrating RERs, load demand flexibility, and energy storage systems within this framework. Nonetheless, renewable energy resources, such as solar and wind energy, can be extremely stochastic as they are weather dependent. These resources coupled with load demand uncertainties lead to random variations on both the generation and load sides, thus challenging optimal energy management. This thesis develops an optimal energy management system (EMS) for a grid-tied solar PV-battery microgrid. The goal of the EMS is to obtain the minimum operational costs (cost of power exchange with the utility and battery wear cost) while still considering network constraints, which ensure grid violations are avoided. A reinforcement learning (RL) approach is proposed to minimize the operational cost of the microgrid under this stochastic setting. RL is a reward-motivated optimization technique derived from how animals learn to optimize their behaviour in new environments. Unlike other conventional model-based optimization approaches, RL doesn't need an explicit model of the optimization system to get optimal solutions. The EMS is modelled as a Markov Decision Process (MDP) to achieve optimality considering the state, action, and reward function. The feasibility of two RL algorithms, namely, conventional Q-learning algorithm and deep Q network algorithm, are developed, and their efficacy in performing optimal energy management for the designed system is evaluated in this thesis. First, the energy management problem is expressed as a sequential decision-making process, after which two algorithms, trading, and non-trading algorithm, are developed. In the trading algorithm case, excess microgrid's energy can be sold back to the utility to increase revenue, while in the latter case constraining rules are embedded in the designed EMS to ensure that no excess energy is sold back to the utility. Then a Q-learning algorithm is developed to minimize the operational cost of the microgrid under unknown future information. Finally, to evaluate the performance of the proposed EMS, a comparison study between a trading case EMS model and a non-trading case is performed using a typical commercial load curve and PV generation profile over a 24- hour horizon. Numerical simulation results indicated that the algorithm learned to select an optimized energy schedule that minimizes energy cost (cost of power purchased from the utility based on the time-varying tariff and battery wear cost) in both summer and winter case studies. However, comparing the non-trading EMS to the trading EMS model operational costs, the latter one decreased cost by 4.033% in the summer season and 2.199% in the winter season. Secondly, a deep Q network (DQN) method that uses recent learning algorithm enhancements, including experience replay and target network, is developed to learn the system uncertainties, including load demand, grid prices and volatile power supply from the renewables solve the optimal energy management problem. Unlike the Q-learning method, which updates the Q-function using a lookup table (which limits its scalability and overall performance in stochastic optimization), the DQN method uses a deep neural network that approximates the Q- function via statistical regression. The performance of the proposed method is evaluated with differently fluctuating load profiles, i.e., slow, medium, and fast. Simulation results substantiated the efficacy of the proposed method as the algorithm was established to learn from experience to raise the battery state of charge and optimally shift loads from a one-time instance, thus supporting the utility grid in reducing aggregate peak load. Furthermore, the performance of the proposed DQN approach was compared to the conventional Q-learning algorithm in terms of achieving a minimum global cost. Simulation results showed that the DQN algorithm outperformed the conventional Q-learning approach, reducing system operational costs by 15%, 24%, and 26% for the slow, medium, and fast fluctuating load profiles in the studied cases

    Using Reinforcement Learning in the tuning of Central Pattern Generators

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaÉ objetivo deste trabalho aplicar técnicas de Reinforcement Learning em tarefas de aprendizagem e locomoção de robôs. Reinforcement Learning é uma técnica de aprendizagem útil no que diz respeito à locomoção de robôs, devido à ênfase que dá à interação direta entre o agente e o meio ambiente, e ao facto de não exigir supervisão ou modelos completos, ao contrário do que acontece nas abordagens clássicas. O objetivo desta técnica consiste na decisão das ações a tomar, de forma a maximizar uma recompensa cumulativa, tendo em conta o facto de que as decisões podem afetar não só as recompensas imediatas, como também as futuras. Neste trabalho será apresentada a estrutura e funcionamento do Reinforcement Learning e a sua aplicação em Central Pattern Generators, com o objetivo de gerar locomoção adaptativa otimizada. De forma a investigar e identificar os pontos fortes e capacidades do Reinforcement Learning, e para demonstrar de uma forma simples este tipo de algoritmos, foram implementados dois casos de estudo baseados no estado da arte. No que diz respeito ao objetivo principal desta tese, duas soluções diferentes foram abordadas: uma primeira baseada em métodos Natural-Actor Critic, e a segunda, em Cross-Entropy Method. Este último algoritmo provou ser capaz de lidar com a integração das duas abordagens propostas. As soluções de integração foram testadas e validadas com recurso ao simulador Webots e ao modelo do robô DARwIN-OP.In this work, it is intended to apply Reinforcement Learning techniques in tasks involving learning and robot locomotion. Reinforcement Learning is a very useful learning technique with regard to legged robot locomotion, due to its ability to provide direct interaction between the agent and the environment, and the fact of not requiring supervision or complete models, in contrast with other classic approaches. Its aim consists in making decisions about which actions to take so as to maximize a cumulative reward or reinforcement signal, taking into account the fact that the decisions may affect not only the immediate reward, but also the future ones. In this work it will be studied and presented the Reinforcement Learning framework and its application in the tuning of Central Pattern Generators, with the aim of generating optimized robot locomotion. In order to investigate the strengths and abilities of Reinforcement Learning, and to demonstrate in a simple way the learning process of such algorithms, two case studies were implemented based on the state-of-the-art. With regard to the main purpose of the thesis, two different solutions are addressed: a first one based on Natural-Actor Critic methods, and a second, based on the Cross-Entropy Method. This last algorithm was found to be very capable of handling with the integration of the two proposed approaches. The integration solutions were tested and validated resorting to Webots simulation and DARwIN-OP robot model

    Un modèle hybride de reconnaissance de plans pour les patients Alzheimer : dilemme entrecroisé/erroné

    Get PDF
    Les récents développements dans les technologies de l'information et l'augmentation des problèmes provenant du domaine de la santé (vieillissement de la population, pénurie de personnel médical) ont fait émerger de nouveaux axes de recherche prometteurs comme l'assistance cognitive pour les personnes atteintes de la maladie d'Alzheimer à l'intérieur d'un habitat intelligent. Une des difficultés majeures inhérentes à ce type d'assistance est la reconnaissance des activités de la vie quotidienne (AVQ) réalisées par le patient à l'intérieur de l'habitat, qui sont déterminées par les actions effectuées par celui-ci, dans le but de prévoir son comportement afin d'identifier les différentes opportunités de le guider dans l'accomplissement de ses AVQ. Toutefois, cette situation soulève un dilemme qui n'a pas encore été pris en considération dans les travaux du domaine de la reconnaissance de plans. En effet, on peut interpréter l'observation d'une nouvelle action différente de celle attendue de deux façons opposées: soit qu'il s'agisse d'une erreur de la part du patient, soit que le patient débute une nouvelle activité, réalisée de façon entrecroisée avec l'activité en cours de réalisation. Pour résoudre cette situation paradoxale, on propose dans ce mémoire un modèle de reconnaissance hybride comme une piste de solutions. Elle consiste en une extension probabiliste d'un modèle logique basé sur la théorie des treillis et sur un formalisme d'action en logique de description, développé durant les travaux de thèse de Bruno Bouchard effectués au laboratoire DOMUS de l'Université de Sherbrooke. Le modèle logique structure le processus de reconnaissance d'activités en un raisonnement par classification des plans possibles, permettant d'extraire un certain nombre d'hypothèses sur le comportement du patient. Notre approche mixte basée sur la logique de description probabiliste, résout le problème d'équiprobabilité en privilégiant certaines hypothèses concernant son comportement. Par conséquent, cette extension minimisera l'incertitude dans la prédiction des actions futures du patient, et plus important encore, d'anticiper les différentes catégories de déviations comportementales typiques d'un patient Alzheimer. Nous avons implémenté et validé le modèle proposé à l'aide d'un ensemble de scénarios tirés de cas réels et de fréquences d'observation inspirées d'une étude effectuée sur des patients réels
    corecore