6 research outputs found
Application of reinforcement learning methods to computer game dynamics
The dynamics of the game world present both challenges and opportunities for AI to make a useful difference. Learning smart behaviours for game assets is a first step towards realistic conflict or cooperation. The scope this thesis is the application of Reinforcement Learning to moving assets in the game world. Game sessions a generate stream data on asset's performance which must be processed on the fly. The lead objective is to produce fast, lightweight and flexible learning algorithms for run-time embedding. The motivation from current work is to shorten the time to achieve a workable policy solution by investigating the exploration / exploitation balance, overcome the curse of dimensionality of complex systems, and avoid the use of extra endogenous parameters which require multiple data passes and use a simple state aggregation rather than functional approximation. How action selection (AS) contributes to efficient learning is a key issue in RL since is determines the balance between exploiting and confirming the current policy or exploring an early less likely policy which may prove better in the long run. The methodology deploys the simulation of several AS using 10-armed bandit problem averaged over 10000 epochs. The results show a considerable variation in performance in terms of latency and asymptotic direction. The Upper Confidence Bound comes out leader over most of the episode range, especially at about 100. Using insight from action selection order statistics are applied to determine a criterion for the convergence of policy evaluation. The probability that the action of maximum sample mean is indeed the action of maximum population mean (PMSMMPM) is calculated using the 3 armed bandit problem. PMSMMPM reaches 0.988 by play 26 which provides evidence for it as a convergence criterion. An iteration stopping rule is defined using PMSMMPM and it shows plausible properties as the population parameters are varied. A mathematical analysis of the approximation (P21) of just taking the top two actions yields a minimum sampling size for any level of P21. Using the gradient of P21 a selection rule is derived and when combined with UCB a new complete exploratory policy is demonstrated for 3-arm bandit that requires just over half the sample size when compared with pure UCB. The results provide evidence that the augmented UCB selection rule will contribute to faster learning. TD sarsa(0) learning algorithm has been applied to learn a steering policy for the untried caravan reversing problem and for the kerb avoiding steering problem of racing car both using negative rewards on failure and a simple aggregation. The output policy for the caravan is validated as non jack-knifing for a high proportion of start states. The racing car policy has a similar validation outcome for two exploratory polies which are compared and contrasted
Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach
Institute for Adaptive and Neural ComputationAward number: 98318242.This thesis is about the dynamic generalisation of continuous action spaces in
reinforcement learning problems.
The standard Reinforcement Learning (RL) account provides a principled and comprehensive
means of optimising a scalar reward signal in a Markov Decision Process.
However, the theory itself does not directly address the imperative issue of generalisation
which naturally arises as a consequence of large or continuous state and action
spaces. A current thrust of research is aimed at fusing the generalisation capabilities
of supervised (and unsupervised) learning techniques with the RL theory. An example
par excellence is Tesauro’s TD-Gammon.
Although much effort has gone into researching ways to represent and generalise over
the input space, much less attention has been paid to the action space. This thesis
first considers the motivation for learning real-valued actions, and then proposes a
set of key properties desirable in any candidate algorithm addressing generalisation
of both input and action spaces. These properties include: Provision of adaptive and
online generalisation, adherence to the standard theory with a central focus on estimating
expected reward, provision for real-valued states and actions, and full support
for a real-valued discounted reward signal. Of particular interest are issues pertaining
to robustness in non-stationary environments, scalability, and efficiency for real-time
learning in applications such as robotics. Since exploring the action space is discovered
to be a potentially costly process, the system should also be flexible enough to
enable maximum reuse of learned actions.
A new approach is proposed which succeeds for the first time in addressing all of the
key issues identified. The algorithm, which is based on the ubiquitous self-organising
map, is analysed and compared with other techniques including those based on the
backpropagation algorithm. The investigation uncovers some important implications
of the differences between these two particular approaches with respect to RL. In particular,
the distributed representation of the multi-layer perceptron is judged to be
something of a double-edged sword offering more sophisticated and more scalable
generalising power, but potentially causing problems in dynamic or non-equiprobable
environments, and tasks involving a highly varying input-output mapping.
The thesis concludes that the self-organising map can be used in conjunction with current
RL theory to provide real-time dynamic representation and generalisation of continuous
action spaces. The proposed model is shown to be reliable in non-stationary,
unpredictable and noisy environments and judged to be unique in addressing and satisfying
a number of desirable properties identified as important to a large class of RL
problems
Learned Feedback & Feedforward Perception & Control
The notions of feedback and feedforward information processing gained prominence under cybernetics, an early movement at the dawn of computer science and theoretical neuroscience. Negative feedback processing corrects errors, whereas feedforward processing makes predictions, thereby preemptively reducing errors. A key insight of cybernetics was that such processes can be applied to both perception, or state estimation, and control, or action selection. The remnants of this insight are found in many modern areas, including predictive coding in neuroscience and deep latent variable models in machine learning. This thesis draws on feedback and feedforward ideas developed within predictive coding, adapting them to improve machine learning techniques for perception (Part II) and control (Part III). Upon establishing these conceptual connections, in Part IV, we traverse this bridge, from machine learning back to neuroscience, arriving at new perspectives on the correspondences between these fields.</p
Factors Influencing Customer Satisfaction towards E-shopping in Malaysia
Online shopping or e-shopping has changed the world of business and quite a few people have
decided to work with these features. What their primary concerns precisely and the responses from
the globalisation are the competency of incorporation while doing their businesses. E-shopping has
also increased substantially in Malaysia in recent years. The rapid increase in the e-commerce
industry in Malaysia has created the demand to emphasize on how to increase customer satisfaction
while operating in the e-retailing environment. It is very important that customers are satisfied with
the website, or else, they would not return. Therefore, a crucial fact to look into is that companies
must ensure that their customers are satisfied with their purchases that are really essential from the ecommerce’s
point of view. With is in mind, this study aimed at investigating customer satisfaction
towards e-shopping in Malaysia. A total of 400 questionnaires were distributed among students
randomly selected from various public and private universities located within Klang valley area.
Total 369 questionnaires were returned, out of which 341 questionnaires were found usable for
further analysis. Finally, SEM was employed to test the hypotheses. This study found that customer
satisfaction towards e-shopping in Malaysia is to a great extent influenced by ease of use, trust,
design of the website, online security and e-service quality. Finally, recommendations and future
study direction is provided.
Keywords: E-shopping, Customer satisfaction, Trust, Online security, E-service quality, Malaysia