1,785 research outputs found

    Simulated Experince Evaluation in Developing Multi-agent Coordination Graphs

    Get PDF
    Cognitive science has proposed that a way people learn is through self-critiquing by generating \u27what-if\u27 strategies for events (simulation). It is theorized that people use this method to learn something new as well as to learn more quickly. This research adds this concept to a graph-based genetic program. Memories are recorded during fitness assessment and retained in a global memory bank based on the magnitude of change in the agent’s energy and age of the memory. Between generations, candidate agents perform in simulations of the stored memories. Candidates that perform similarly to good memories and differently from bad memories are more likely to be included in the next generation. The simulation-informed genetic program is evaluated in two domains: sequence matching and Robocode. Results indicate the algorithm does not perform equally in all environments. In sequence matching, experiential evaluation fails to perform better than the control. However, in Robocode, the experiential evaluation method initially outperforms the control then stagnates and often regresses. This is likely an indication that the algorithm is over-learning a single solution rather than adapting to the environment and that learning through simulation includes a satisficing component

    Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising

    Get PDF
    Real-time advertising allows advertisers to bid for each impression for a visiting user. To optimize specific goals such as maximizing revenue and return on investment (ROI) led by ad placements, advertisers not only need to estimate the relevance between the ads and user's interests, but most importantly require a strategic response with respect to other advertisers bidding in the market. In this paper, we formulate bidding optimization with multi-agent reinforcement learning. To deal with a large number of advertisers, we propose a clustering method and assign each cluster with a strategic bidding agent. A practical Distributed Coordinated Multi-Agent Bidding (DCMAB) has been proposed and implemented to balance the tradeoff between the competition and cooperation among advertisers. The empirical study on our industry-scaled real-world data has demonstrated the effectiveness of our methods. Our results show cluster-based bidding would largely outperform single-agent and bandit approaches, and the coordinated bidding achieves better overall objectives than purely self-interested bidding agents

    Sample-Efficient Reinforcement Learning of Robot Control Policies in the Real World

    Get PDF
    abstract: The goal of reinforcement learning is to enable systems to autonomously solve tasks in the real world, even in the absence of prior data. To succeed in such situations, reinforcement learning algorithms collect new experience through interactions with the environment to further the learning process. The behaviour is optimized by maximizing a reward function, which assigns high numerical values to desired behaviours. Especially in robotics, such interactions with the environment are expensive in terms of the required execution time, human involvement, and mechanical degradation of the system itself. Therefore, this thesis aims to introduce sample-efficient reinforcement learning methods which are applicable to real-world settings and control tasks such as bimanual manipulation and locomotion. Sample efficiency is achieved through directed exploration, either by using dimensionality reduction or trajectory optimization methods. Finally, it is demonstrated how data-efficient reinforcement learning methods can be used to optimize the behaviour and morphology of robots at the same time.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Indirect Methods for Robot Skill Learning

    Get PDF
    Robot learning algorithms are appealing alternatives for acquiring rational robotic behaviors from data collected during the execution of tasks. Furthermore, most robot learning techniques are stated as isolated stages and focused on directly obtaining rational policies as a result of optimizing only performance measures of single tasks. However, formulating robotic skill acquisition processes in such a way have some disadvantages. For example, if the same skill has to be learned by different robots, independent learning processes should be carried out for acquiring exclusive policies for each robot. Similarly, if a robot has to learn diverse skills, the robot should acquire the policy for each task in separate learning processes, in a sequential order and commonly starting from scratch. In the same way, formulating the learning process in terms of only the performance measure, makes robots to unintentionally avoid situations that should not be repeated, but without any mechanism that captures the necessity of not repeating those wrong behaviors. In contrast, humans and other animals exploit their experience not only for improving the performance of the task they are currently executing, but for constructing indirectly multiple models to help them with that particular task and to generalize to new problems. Accordingly, the models and algorithms proposed in this thesis seek to be more data efficient and extract more information from the interaction data that is collected either from expert\u2019s demonstrations or the robot\u2019s own experience. The first approach encodes robotic skills with shared latent variable models, obtaining latent representations that can be transferred from one robot to others, therefore avoiding to learn the same task from scratch. The second approach learns complex rational policies by representing them as hierarchical models that can perform multiple concurrent tasks, and whose components are learned in the same learning process, instead of separate processes. Finally, the third approach uses the interaction data for learning two alternative and antagonistic policies that capture what to and not to do, and which influence the learning process in addition to the performance measure defined for the task

    Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

    Full text link
    Cooperative game is a critical research area in the multi-agent reinforcement learning (MARL). Global reward game is a subclass of cooperative games, where all agents aim to maximize the global reward. Credit assignment is an important problem studied in the global reward game. Most of previous works stood by the view of non-cooperative-game theoretical framework with the shared reward approach, i.e., each agent being assigned a shared global reward directly. This, however, may give each agent an inaccurate reward on its contribution to the group, which could cause inefficient learning. To deal with this problem, we i) introduce a cooperative-game theoretical framework called extended convex game (ECG) that is a superset of global reward game, and ii) propose a local reward approach called Shapley Q-value. Shapley Q-value is able to distribute the global reward, reflecting each agent's own contribution in contrast to the shared reward approach. Moreover, we derive an MARL algorithm called Shapley Q-value deep deterministic policy gradient (SQDDPG), using Shapley Q-value as the critic for each agent. We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator and Traffic Junction, compared with the state-of-the-art algorithms, e.g., MADDPG, COMA, Independent DDPG and Independent A2C. In the experiments, SQDDPG shows a significant improvement on the convergence rate. Finally, we plot Shapley Q-value and validate the property of fair credit assignment
    • …
    corecore