Search CORE

4 research outputs found

Oppositional Reinforcement Learning with Applications

Author: Shokri Maryam
Publication venue: 'University of Waterloo'
Publication date: 05/09/2008
Field of study

Machine intelligence techniques contribute to solving real-world problems. Reinforcement learning (RL) is one of the machine intelligence techniques with several characteristics that make it suitable for the applications, for which the model of the environment is not available to the agent. In real-world applications, intelligent agents generally face a very large state space which limits the usability of reinforcement learning. The condition for convergence of reinforcement learning implies that each state-action pair must be visited infinite times, a condition which can be considered impossible to be satisfied in many practical situations. The goal of this work is to propose a class of new techniques to overcome this problem for off-policy, step-by-step (incremental) and model-free reinforcement learning with discrete state and action space. The focus of this research is using the design characteristics of RL agent to improve its performance regarding the running time while maintaining an acceptable level of accuracy. One way of improving the performance of the intelligent agents is using the model of environment. In this work, a special type of knowledge about the agent actions is employed to improve its performance because in many applications the model of environment may only be known partially or not at all. The concept of opposition is employed in the framework of reinforcement learning to achieve this goal. One of the components of RL agent is the action. For each action we define its associate opposite action. The actions and opposite actions are implemented in the framework of reinforcement learning to update the value function resulting in a faster convergence. At the beginning of this research the concept of opposition is incorporated in the components of reinforcement learning, states, actions, and reinforcement signal which results in introduction of the oppositional target domain estimation algorithm, OTE. OTE reduces the search and navigation area and accelerates the speed of search for a target. The OTE algorithm is limited to the applications, in which the model of the environment is provided for the agent. Hence, further investigation is conducted to extend the concept of opposition to the model-free reinforcement learning algorithms. This extension contributes to the generating of several algorithms based on using the concept of opposition for Q(lambda) technique. The design of reinforcement learning agent depends on the application. The emphasize of this research is on the characteristics of the actions. Hence, the primary challenge of this work is design and incorporation of the opposite actions in the framework of RL agents. In this research, three different applications, namely grid navigation, elevator control problem, and image thresholding are implemented to address this challenge in context of different applications. The design challenges and some solutions to overcome the problems and improve the algorithms are also investigated. The opposition-based Q(lambda) algorithms are tested for the applications mentioned earlier. The general idea behind the opposition-based Q(lambda) algorithms is that in Q-value updating, the agent updates the value of an action in a given state. Hence, if the agent knows the value of opposite action then instead of one value, the agent can update two Q-values at the same time without taking its corresponding opposite action causing an explicit transition to opposite state. If the agent knows both values of action and its opposite action for a given state, then it can update two Q-values. This accelerates the learning process in general and the exploration phase in particular. Several algorithms are outlined in this work. The OQ(lambda) will be introduced to accelerate Q(lambda) algorithm in discrete state spaces. The NOQ(lambda) method is an extension of OQ(lambda) to operate in a broader range of non-deterministic environments. The update of the opposition trace in OQ(lambda) depends on the next state of the opposite action (which generally is not taken by the agent). This limits the usability of this technique to the deterministic environments because the next state should be known to the agent. NOQ(lambda) will be presented to update the opposition trace independent of knowing the next state for the opposite action. The results show the improvement of the performance in terms of running time for the proposed algorithms comparing to the standard Q(lambda) technique

University of Waterloo's Institutional Repository

Roulette Wheel Selection Algorithm (RWSA) and Reinforcement Learning (RL) for personalizing and improving e-learning system

Author: Ballera Melvin Abes
Publication venue: Strathmore University
Publication date: 01/01/2015
Field of study

Thesis submitted in total fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science at Strathmore UniversityVarious mechanisms to improve the learning process with the main objective of maximizing learning and dynamically selecting the best teaching operation to achieve learning goals have been done in the field of personalized learning. Despite recommending a personalized learning sequence, e-learning instructional strategists have failed to perform or address the necessary corrective measures to remediate immediately learning misconceptions or difficulties. As e-learning materials continue to evolve, it is necessary that an alternative, dynamic, and real time multi-performance be developed and implemented in e-learning systems. Two major contributions in the field of e-learning have been asserted by this study: it personalizes the learning sequence using reversed roulette wheel selection algorithm blended with linear ranking based on real time, dynamic multi-based performance matrix; and implements the reinforcement and mastery learning to motivate students and improve their learning output. Based on experiments, personalized learning sequence (PLS) were dynamic and heuristic and simultaneously considers the curriculum difficulty level and the curriculum continuity of successive curriculum while implementing personalized learning process. From 34%, the passing rate of the students is increased by 54% making the overall passing rate to 88%. The increase can be attributed to the reinforcement process and mastery learning where various control mechanism are implemented to guarantee learning process. Digital transcripts based on students’ perceptions and experiences positively correlate with the result of document sentiment of +.321 while theme analysis revealed a positive attitude with the extracted words in the documents such as: very happy, friends, motivate, improve, understanding, knowledge and good. Overall, the e-learning prototype were able to show an improved academic performance of the student and address different academics and social problems and allow students to study anywhere, at their own convenience whenever online learning is possible and accessible

SU+ Digital Repository

Emotionally motivated reinforcement learning based controller

Author: Ayesh Aladdin, 1972-
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

De Montfort University Open Research Archive