73 research outputs found

    Projective simulation for classical learning agents: a comprehensive investigation

    Get PDF
    We study the model of projective simulation (PS), a novel approach to arti cial intelligence based on stochastic processing of episodic memory which was recently introduced [1]. Here we provide a detailed analysis of the model and examine its performance, including its achievable e ciency, its learning times and the way both properties scale with the problems' dimension. In addition, we situate the PS agent in di erent learning scenarios, and study its learning abilities. A variety of new scenarios are being considered, thereby demonstrating the model's exibility. Further more, to put the PS scheme in context, we compare its performance with those of Q-learning and learning classi er systems, two popular models in the eld of reinforcement learning. It is shown that PS is a competitive arti cial intelligence model of unique properties and strengths.Austrian Science Fund (FWF) SFB FoQuS F4012Templeton World Charity Foundation (TWCF

    The roles of online and offline replay in planning

    Get PDF
    Animals and humans replay neural patterns encoding trajectories through their environment, both whilst they solve decision-making tasks and during rest. Both on-task and off-task replay are believed to contribute to flexible decision making, though how their relative contributions differ remains unclear. We investigated this question by using magnetoencephalography (MEG) to study human subjects while they performed a decision-making task that was designed to reveal the decision algorithms employed. We characterised subjects in terms of how flexibly each adjusted their choices to changes in temporal, spatial and reward structure. The more flexible a subject, the more they replayed trajectories during task performance, and this replay was coupled with re-planning of the encoded trajectories. The less flexible a subject, the more they replayed previously preferred trajectories during rest periods between task epochs. The data suggest that online and offline replay both participate in planning but support distinct decision strategies

    Handling Out-of-Sequence Data: Kalman Filter Methods or Statistical Imputation?

    Get PDF
    The issue of handling sensor measurements data over single and multiple lag delays also known as outof-sequence measurement (OOSM) has been considered. It is argued that this problem can also be addressed using model-based imputation strategies and their application in comparison to Kalman filter (KF)-based approaches for a multi-sensor tracking prediction problem has also been demonstrated. The effectiveness of two model-based imputation procedures against five OOSM methods was investigated in Monte Carlo simulation experiments. The delayed measurements were either incorporated (or fused) at the time these were finally available (using OOSM methods) or imputed in a random way with higher probability of delays for multiple lags and lower probability of delays for a single lag (using single or multiple imputation). For single lag, estimates of target tracking computed from the observed data and those based on a data set in which the delayed measurements were imputed were equally unbiased; however, the KF estimates obtained using the Bayesian framework (BF-KF) were more precise. When the measurements were delayed in a multiple lag fashion, there were significant differences in bias or precision between multiple imputation (MI) and OOSM methods, with the former exhibiting a superior performance at nearly all levels of probability of measurement delay and range of manoeuvring indices. Researchers working on sensor data are encouraged to take advantage of software to implement delayed measurements using MI, as estimates of tracking are more precise and less biased in the presence of delayed multi-sensor data than those derived from an observed data analysis approach.Defence Science Journal, 2010, 60(1), pp.87-99, DOI:http://dx.doi.org/10.14429/dsj.60.11

    Model-based hyperparameter optimization

    Full text link
    The primary goal of this work is to propose a methodology for discovering hyperparameters. Hyperparameters aid systems in convergence when well-tuned and handcrafted. However, to this end, poorly chosen hyperparameters leave practitioners in limbo, between concerns with implementation or improper choice in hyperparameter and system configuration. We specifically analyze the choice of learning rate in stochastic gradient descent (SGD), a popular algorithm. As a secondary goal, we attempt the discovery of fixed points using smoothing of the loss landscape by exploiting assumptions about its distribution to improve the update rule in SGD. Smoothing of the loss landscape has been shown to make convergence possible in large-scale systems and difficult black-box optimization problems. However, we use stochastic value gradients (SVG) to smooth the loss landscape by learning a surrogate model and then backpropagate through this model to discover fixed points on the real task SGD is trying to solve. Additionally, we construct a gym environment for testing model-free algorithms, such as Proximal Policy Optimization (PPO) as a hyperparameter optimizer for SGD. For tasks, we focus on a toy problem and analyze the convergence of SGD on MNIST using model-free and model-based reinforcement learning methods for control. The model is learned from the parameters of the true optimizer and used specifically for learning rates rather than for prediction. In experiments, we perform in an online and offline setting. In the online setting, we learn a surrogate model alongside the true optimizer, where hyperparameters are tuned in real-time for the true optimizer. In the offline setting, we show that there is more potential in the model-based learning methodology than in the model-free configuration due to this surrogate model that smooths out the loss landscape and makes for more helpful gradients during backpropagation.L’objectif principal de ce travail est de proposer une méthodologie de découverte des hyperparamètres. Les hyperparamètres aident les systèmes à converger lorsqu’ils sont bien réglés et fabriqués à la main. Cependant, à cette fin, des hyperparamètres mal choisis laissent les praticiens dans l’incertitude, entre soucis de mise en oeuvre ou mauvais choix d’hyperparamètre et de configuration du système. Nous analysons spécifiquement le choix du taux d’apprentissage dans la descente de gradient stochastique (SGD), un algorithme populaire. Comme objectif secondaire, nous tentons de découvrir des points fixes en utilisant le lissage du paysage des pertes en exploitant des hypothèses sur sa distribution pour améliorer la règle de mise à jour dans SGD. Il a été démontré que le lissage du paysage des pertes rend la convergence possible dans les systèmes à grande échelle et les problèmes difficiles d’optimisation de la boîte noire. Cependant, nous utilisons des gradients de valeur stochastiques (SVG) pour lisser le paysage des pertes en apprenant un modèle de substitution, puis rétropropager à travers ce modèle pour découvrir des points fixes sur la tâche réelle que SGD essaie de résoudre. De plus, nous construisons un environnement de gym pour tester des algorithmes sans modèle, tels que Proximal Policy Optimization (PPO) en tant qu’optimiseur d’hyperparamètres pour SGD. Pour les tâches, nous nous concentrons sur un problème de jouet et analysons la convergence de SGD sur MNIST en utilisant des méthodes d’apprentissage par renforcement sans modèle et basées sur un modèle pour le contrôle. Le modèle est appris à partir des paramètres du véritable optimiseur et utilisé spécifiquement pour les taux d’apprentissage plutôt que pour la prédiction. Dans les expériences, nous effectuons dans un cadre en ligne et hors ligne. Dans le cadre en ligne, nous apprenons un modèle de substitution aux côtés du véritable optimiseur, où les hyperparamètres sont réglés en temps réel pour le véritable optimiseur. Dans le cadre hors ligne, nous montrons qu’il y a plus de potentiel dans la méthodologie d’apprentissage basée sur un modèle que dans la configuration sans modèle en raison de ce modèle de substitution qui lisse le paysage des pertes et crée des gradients plus utiles lors de la rétropropagation

    On the Modeling of Dynamic-Systems using Sequence-based Deep Neural-Networks

    Get PDF
    The objective of this thesis is the adaptation and development of sequence-based Neural-Networks (NNs) applied to the modeling of dynamic systems. More specifically, we will focus our study on 2 sub-problems: the modeling of time-series, the modeling and control of multiple-input multiple-output (MIMO) systems. These 2 sub-problems will be explored through the modeling of crops, and the modeling and control of robots. To solve these problems, we build on NNs and training schemes allowing our models to out-perform the state-of-the-art results in their respective fields. In the irrigation field, we show that NNs are powerful tools capable of modeling the water consumption of crops while observing only a portion of what is currently required by reference methods. We further demonstrate the potential of NNs by inferring irrigation recommendations in real-time. In robotics, we show that prioritization techniques can be used to learn better robot dynamic models. We apply the models learned using these methods inside an Model Predictive Control (MPC) controller, further demonstrating their benefits. Additionally, we leverage Dreamer, an Model Based Reinforcement Learning (MBRL) agent, to solve visuomotor tasks. We demonstrate that MBRL controllers can be used for sensor-based control on real robots without being trained on real systems. Adding to this result, we developed a physics-guided variant of DREAMER. This variation of the original algorithm is more flexible and designed for mobile robots. This novel framework enables reusing previously learned dynamics and transferring environment knowledge to other robots. Furthermore, using this new model, we train agents to reach various goals without interacting with the system. This increases the reusability of the learned models and makes for a highly data-efficient learning scheme. Moreover, this allows for efficient dynamics randomization, creating robust agents that transfer well to unseen dynamics.Ph.D

    Numerical modelling of additive manufacturing process for stainless steel tension testing samples

    Get PDF
    Nowadays additive manufacturing (AM) technologies including 3D printing grow rapidly and they are expected to replace conventional subtractive manufacturing technologies to some extents. During a selective laser melting (SLM) process as one of popular AM technologies for metals, large amount of heats is required to melt metal powders, and this leads to distortions and/or shrinkages of additively manufactured parts. It is useful to predict the 3D printed parts to control unwanted distortions and shrinkages before their 3D printing. This study develops a two-phase numerical modelling and simulation process of AM process for 17-4PH stainless steel and it considers the importance of post-processing and the need for calibration to achieve a high-quality printing at the end. By using this proposed AM modelling and simulation process, optimal process parameters, material properties, and topology can be obtained to ensure a part 3D printed successfully
    • …
    corecore