516,313 research outputs found
Recommended from our members
Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning
Model-based reinforcement learning algorithms have been shown to achieve successful results on various continuous control benchmarks, but the understanding of model-based methods is limited. We try to interpret how model-based method works through novel experiments on state-of-the-art algorithms with an emphasis on the model learning part. We evaluate the role of the model learning in policy optimization and propose methods to learn a more accurate model. With a better understanding of model-based reinforcement learning, we then apply model-based methods to solve safe reinforcement learning (RL) problems with near-zero violation of hard constraints throughout training. Drawing an analogy with how humans and animals learn to perform safe actions, we break down the safe RL problem into three stages. First, we train agents in a constraint-free environment to learn a performant policy for reaching high rewards, and simultaneously learn a model of the dynamics. Second, we use model-based methods to plan safe actions and train a safeguarding policy from these actions through imitation. Finally, we propose a factored framework to train an overall policy that mixes the performant policy and the safeguarding policy. This three-step curriculum ensures near-zero violation of safety constraints at all times. As an advantage of model-based method, the sample complexity required at the second and third steps of the process is significantly lower than model-free methods and can enable online safe learning. We demonstrate the effectiveness of our methods in various continuous control problems and analyze the advantages over state-of-the-art approaches
Learning action-oriented models through active inference
Converging theories suggest that organisms learn and exploit probabilistic models of their environment. However, it remains unclear how such models can be learned in practice. The open-ended complexity of natural environments means that it is generally infeasible for organisms to model their environment comprehensively. Alternatively, action-oriented models attempt to encode a parsimonious representation of adaptive agent-environment interactions. One approach to learning action-oriented models is to learn online in the presence of goal-directed behaviours. This constrains an agent to behaviourally relevant trajectories, reducing the diversity of the data a model need account for. Unfortunately, this approach can cause models to prematurely converge to sub-optimal solutions, through a process we refer to as a bad-bootstrap. Here, we exploit the normative framework of active inference to show that efficient action-oriented models can be learned by balancing goal-oriented and epistemic (information-seeking) behaviours in a principled manner. We illustrate our approach using a simple agent-based model of bacterial chemotaxis. We first demonstrate that learning via goal-directed behaviour indeed constrains models to behaviorally relevant aspects of the environment, but that this approach is prone to sub-optimal convergence. We then demonstrate that epistemic behaviours facilitate the construction of accurate and comprehensive models, but that these models are not tailored to any specific behavioural niche and are therefore less efficient in their use of data. Finally, we show that active inference agents learn models that are parsimonious, tailored to action, and which avoid bad bootstraps and sub-optimal convergence. Critically, our results indicate that models learned through active inference can support adaptive behaviour in spite of, and indeed because of, their departure from veridical representations of the environment. Our approach provides a principled method for learning adaptive models from limited interactions with an environment, highlighting a route to sample efficient learning algorithms
Using similar tasks to increase negotiation of meaning and language production in an online second language learning environment
This study investigates the use of authentic subtitled similar task videos (ASSTVs) and their relationship to second language negotiation of meaning and language production among non-native speakers of English in an online task-based language learning (TBLL) environment. Over the course of two weeks, twenty intermediate nonnative speakers (NNSs) of English from the English Language Institute at Texas A&M University engaged in four communicative tasks in pairs using an online TBLL environment designed specifically for this study, and a chat tool in WebCT Vista, a course management system provided by the university. ASSTVs were videotaped and integrated into the online TBLL environment. Participants were divided into two groups, each of which consisted of five dyads, to test the effects of ASSTVs. Five dyads were provided with the ASSTVs and the remaining five dyads were not provided with them before the task completion process. The first section of this study examines the effects of ASSTVs on negotiation of meaning, and the second section examines the effects on language production. The amount of negotiation of meaning was calculated through the negotiation of meaning sequences model developed by Gass and Varonis and revised for online communication by Smith. Language production was investigated in terms of fluency and complexity with regard to lexical and syntactic complexity. A detailed analysis of the data from the chat-scripts showed that NNSs engage in more negotiation of meaning and produce more fluent and lexically diverse language when provided with the ASSTVs than NNSs who were not provided with them. Based on these findings, this study concludes that using ASSTVs in an online TBLL environment is a viable and effective tool for promoting negotiation of meaning and language production in terms of fluency and lexical complexity
The Effect of Aleks on Students\u27 Mathematics Achievement in an Online Learning Environment and the Cognitive Complexity of the Initial and Final Assessments
For many courses, mathematics included, there is an associated interactive e-learning system that provides assessment and tutoring. Some of these systems are classified as Intelligent Tutoring Systems. MyMathLab, Mathzone, and Assessment of LEarning in Knowledge Space (ALEKS) are just a few of the interactive e-learning systems in mathematics. In ALEKS, assessment and tutoring are based on the Knowledge Space Theory. Previous studies in a traditional learning environment have shown ALEKS users to perform equally or better in mathematics achievement than the group who did not use ALEKS.
The purpose of this research was to investigate the effect of ALEKS on students’ achievement in mathematics in an online learning environment and to determine the cognitive complexity of mathematical tasks enacted by ALEKS’s initial (pretest) and final (posttest) assessments. The targeted population for this study was undergraduate students in College Mathematics I, in an online course at a private university in the southwestern United States. The study used a quasi-experimental One-Group non-randomized pretest and posttest design.
Five methods of analysis and one model were used in analyzing data: t-test, correctional analysis, simple and multiple regression analysis, Cronbach’s Alpha reliability test and Webb’s depth of knowledge model. A t-test showed a difference between the pretest and posttest reports, meaning ALEKS had a significant effect on students’ mathematics achievement. The correlation analysis showed a significant positive linear relationship between the concept mastery reports and the formative and summative assessments reports meaning there is a direct relationship between the ALEKS concept mastery and the assessments. The regression equation showed a better model for predicting mathematics achievement with ALEKS when the time spent learning in ALEKS and the concept mastery scores are used as part of the model.
According to Webb’s depth of knowledge model, the cognitive complexity of the pretest and posttest question items used by ALEKS were as follows: 50.5% required application of skills and concepts, 37.1% required recall of information, and 12.4% required strategic thinking: None of the questions items required extended thinking or complex reasoning, implying ALEKS is appropriate for skills and concepts building at this level of mathematics
Embedding Model-Based Fast Meta Learning for Downlink Beamforming Adaptation
This paper studies the fast adaptive beamforming for the multiuser multiple-input single-output downlink. Existing deep learning-based approaches assume that training and testing channels follow the same distribution which causes task mismatch, when the testing environment changes. Although meta learning can deal with the task mismatch, it relies on labelled data and incurs high complexity in the pre-training and fine tuning stages. We propose a simple yet effective adaptive framework to solve the mismatch issue, which trains an embedding model as a transferable feature extractor, followed by fitting the support vector regression. Compared to the existing meta learning algorithm, our method does not necessarily need labelled data in the pre-training and does not need fine-tuning of the pre-trained model in the adaptation. The effectiveness of the proposed method is verified through two well-known applications, i.e., the signal to interference plus noise ratio balancing problem and the sum rate maximization problem. Furthermore, we extend our proposed method to online scenarios in non-stationary environments. Simulation results demonstrate the advantages of the proposed algorithm in terms of both performance and complexity. The proposed framework can also be applied to general radio resource management problems
Field dose radiation determination by active learning with gaussian process for autonomous robot guiding
This article proposes an approach for determination of radiation dose pro le in a radiation-susceptible
environment, aiming to guide an autonomous robot in acting on those environments, reducing the human
exposure to dangerous amount of dose. The approach consists of an active learning method based on
information entropy reduction, using log-normally warped Gaussian Process (GP) as surrogate model,
resulting in non-linear online regression with sequential measurements. Experiments with simulated
radiation dose elds of varying complexity were made, and results showed that the approach was e ective
in reconstruct the eld with high accuracy, through relatively few measurements. The technique was
also shown some robustness in presence measurement noise, present in real measurements, by assuming
Gaussian noise
- …