6 research outputs found

    Application of reinforcement learning methods to computer game dynamics

    Get PDF
    The dynamics of the game world present both challenges and opportunities for AI to make a useful difference. Learning smart behaviours for game assets is a first step towards realistic conflict or cooperation. The scope this thesis is the application of Reinforcement Learning to moving assets in the game world. Game sessions a generate stream data on asset's performance which must be processed on the fly. The lead objective is to produce fast, lightweight and flexible learning algorithms for run-time embedding. The motivation from current work is to shorten the time to achieve a workable policy solution by investigating the exploration / exploitation balance, overcome the curse of dimensionality of complex systems, and avoid the use of extra endogenous parameters which require multiple data passes and use a simple state aggregation rather than functional approximation. How action selection (AS) contributes to efficient learning is a key issue in RL since is determines the balance between exploiting and confirming the current policy or exploring an early less likely policy which may prove better in the long run. The methodology deploys the simulation of several AS using 10-armed bandit problem averaged over 10000 epochs. The results show a considerable variation in performance in terms of latency and asymptotic direction. The Upper Confidence Bound comes out leader over most of the episode range, especially at about 100. Using insight from action selection order statistics are applied to determine a criterion for the convergence of policy evaluation. The probability that the action of maximum sample mean is indeed the action of maximum population mean (PMSMMPM) is calculated using the 3 armed bandit problem. PMSMMPM reaches 0.988 by play 26 which provides evidence for it as a convergence criterion. An iteration stopping rule is defined using PMSMMPM and it shows plausible properties as the population parameters are varied. A mathematical analysis of the approximation (P21) of just taking the top two actions yields a minimum sampling size for any level of P21. Using the gradient of P21 a selection rule is derived and when combined with UCB a new complete exploratory policy is demonstrated for 3-arm bandit that requires just over half the sample size when compared with pure UCB. The results provide evidence that the augmented UCB selection rule will contribute to faster learning. TD sarsa(0) learning algorithm has been applied to learn a steering policy for the untried caravan reversing problem and for the kerb avoiding steering problem of racing car both using negative rewards on failure and a simple aggregation. The output policy for the caravan is validated as non jack-knifing for a high proportion of start states. The racing car policy has a similar validation outcome for two exploratory polies which are compared and contrasted

    Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach

    Get PDF
    Institute for Adaptive and Neural ComputationAward number: 98318242.This thesis is about the dynamic generalisation of continuous action spaces in reinforcement learning problems. The standard Reinforcement Learning (RL) account provides a principled and comprehensive means of optimising a scalar reward signal in a Markov Decision Process. However, the theory itself does not directly address the imperative issue of generalisation which naturally arises as a consequence of large or continuous state and action spaces. A current thrust of research is aimed at fusing the generalisation capabilities of supervised (and unsupervised) learning techniques with the RL theory. An example par excellence is Tesauro’s TD-Gammon. Although much effort has gone into researching ways to represent and generalise over the input space, much less attention has been paid to the action space. This thesis first considers the motivation for learning real-valued actions, and then proposes a set of key properties desirable in any candidate algorithm addressing generalisation of both input and action spaces. These properties include: Provision of adaptive and online generalisation, adherence to the standard theory with a central focus on estimating expected reward, provision for real-valued states and actions, and full support for a real-valued discounted reward signal. Of particular interest are issues pertaining to robustness in non-stationary environments, scalability, and efficiency for real-time learning in applications such as robotics. Since exploring the action space is discovered to be a potentially costly process, the system should also be flexible enough to enable maximum reuse of learned actions. A new approach is proposed which succeeds for the first time in addressing all of the key issues identified. The algorithm, which is based on the ubiquitous self-organising map, is analysed and compared with other techniques including those based on the backpropagation algorithm. The investigation uncovers some important implications of the differences between these two particular approaches with respect to RL. In particular, the distributed representation of the multi-layer perceptron is judged to be something of a double-edged sword offering more sophisticated and more scalable generalising power, but potentially causing problems in dynamic or non-equiprobable environments, and tasks involving a highly varying input-output mapping. The thesis concludes that the self-organising map can be used in conjunction with current RL theory to provide real-time dynamic representation and generalisation of continuous action spaces. The proposed model is shown to be reliable in non-stationary, unpredictable and noisy environments and judged to be unique in addressing and satisfying a number of desirable properties identified as important to a large class of RL problems

    Learned Feedback & Feedforward Perception & Control

    Get PDF
    The notions of feedback and feedforward information processing gained prominence under cybernetics, an early movement at the dawn of computer science and theoretical neuroscience. Negative feedback processing corrects errors, whereas feedforward processing makes predictions, thereby preemptively reducing errors. A key insight of cybernetics was that such processes can be applied to both perception, or state estimation, and control, or action selection. The remnants of this insight are found in many modern areas, including predictive coding in neuroscience and deep latent variable models in machine learning. This thesis draws on feedback and feedforward ideas developed within predictive coding, adapting them to improve machine learning techniques for perception (Part II) and control (Part III). Upon establishing these conceptual connections, in Part IV, we traverse this bridge, from machine learning back to neuroscience, arriving at new perspectives on the correspondences between these fields.</p

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Factors Influencing Customer Satisfaction towards E-shopping in Malaysia

    Get PDF
    Online shopping or e-shopping has changed the world of business and quite a few people have decided to work with these features. What their primary concerns precisely and the responses from the globalisation are the competency of incorporation while doing their businesses. E-shopping has also increased substantially in Malaysia in recent years. The rapid increase in the e-commerce industry in Malaysia has created the demand to emphasize on how to increase customer satisfaction while operating in the e-retailing environment. It is very important that customers are satisfied with the website, or else, they would not return. Therefore, a crucial fact to look into is that companies must ensure that their customers are satisfied with their purchases that are really essential from the ecommerce’s point of view. With is in mind, this study aimed at investigating customer satisfaction towards e-shopping in Malaysia. A total of 400 questionnaires were distributed among students randomly selected from various public and private universities located within Klang valley area. Total 369 questionnaires were returned, out of which 341 questionnaires were found usable for further analysis. Finally, SEM was employed to test the hypotheses. This study found that customer satisfaction towards e-shopping in Malaysia is to a great extent influenced by ease of use, trust, design of the website, online security and e-service quality. Finally, recommendations and future study direction is provided. Keywords: E-shopping, Customer satisfaction, Trust, Online security, E-service quality, Malaysia
    corecore