9 research outputs found

    Neuroevolution of Self-Interpretable Agents

    Full text link
    Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail. Videos of our results and source code available at https://attentionagent.github.io/Comment: To appear at the Genetic and Evolutionary Computation Conference (GECCO 2020) as a full pape

    Maximizing User Engagement In Short Marketing Campaigns Within An Online Living Lab: A Reinforcement Learning Perspective

    Get PDF
    ABSTRACT MAXIMIZING USER ENGAGEMENT IN SHORT MARKETING CAMPAIGNS WITHIN AN ONLINE LIVING LAB: A REINFORCEMENT LEARNING PERSPECTIVE by ANIEKAN MICHAEL INI-ABASI August 2021 Advisor: Dr. Ratna Babu Chinnam Major: Industrial & Systems Engineering Degree: Doctor of Philosophy User engagement has emerged as the engine driving online business growth. Many firms have pay incentives tied to engagement and growth metrics. These corporations are turning to recommender systems as the tool of choice in the business of maximizing engagement. LinkedIn reported a 40% higher email response with the introduction of a new recommender system. At Amazon 35% of sales originate from recommendations, while Netflix reports that ‘75% of what people watch is from some sort of recommendation,’ with an estimated business value of 1billionperyear.Whiletheleadingcompanieshavebeenquitesuccessfulatharnessingthepowerofrecommenderstoboostuserengagementacrossthedigitalecosystem,smallandmediumbusinesses(SMB)arestrugglingwithdecliningengagementacrossmanychannelsascompetitionforuserattentionintensifies.TheSMBsoftenlackthetechnicalexpertiseandbigdatainfrastructurenecessarytooperationalizerecommendersystems.Thepurposeofthisstudyistoexplorethemethodsofbuildingalearningagentthatcanbeusedtopersonalizeapersuasiverequesttomaximizeuserengagementinadata−efficientsetting.Weframethetaskasasequentialdecision−makingproblem,modelledasMDP,andsolvedusingageneralizedreinforcementlearning(RL)algorithm.Weleverageanapproachthateliminatesoratleastgreatlyreducestheneedformassiveamountsoftrainingdata,thusmovingawayfromapurelydata−drivenapproach.Byincorporatingdomainknowledgefromtheliteratureonpersuasionintothemessagecomposition,weareabletotraintheRLagentinasampleefficientandoperantmanner.Inourmethodology,theRLagentnominatesacandidatefromacatalogofpersuasionprinciplestodrivehigheruserresponseandengagement.ToenabletheeffectiveuseofRLinourspecificsetting,wefirstbuildareducedstatespacerepresentationbycompressingthedatausinganexponentialmovingaveragescheme.AregularizedDQNagentisdeployedtolearnanoptimalpolicy,whichisthenappliedinrecommendingone(oracombination)ofsixuniversalprinciplesmostlikelytotriggerresponsesfromusersduringthenextmessagecycle.Inthisstudy,emailmessagingisusedasthevehicletodeliverpersuasionprinciplestotheuser.Atatimeofdecliningclick−throughrateswithmarketingemails,businessexecutivescontinuetoshowheightenedinterestintheemailchannelowingtohigher−than−usualreturnoninvestmentof1 billion per year. While the leading companies have been quite successful at harnessing the power of recommenders to boost user engagement across the digital ecosystem, small and medium businesses (SMB) are struggling with declining engagement across many channels as competition for user attention intensifies. The SMBs often lack the technical expertise and big data infrastructure necessary to operationalize recommender systems. The purpose of this study is to explore the methods of building a learning agent that can be used to personalize a persuasive request to maximize user engagement in a data-efficient setting. We frame the task as a sequential decision-making problem, modelled as MDP, and solved using a generalized reinforcement learning (RL) algorithm. We leverage an approach that eliminates or at least greatly reduces the need for massive amounts of training data, thus moving away from a purely data-driven approach. By incorporating domain knowledge from the literature on persuasion into the message composition, we are able to train the RL agent in a sample efficient and operant manner. In our methodology, the RL agent nominates a candidate from a catalog of persuasion principles to drive higher user response and engagement. To enable the effective use of RL in our specific setting, we first build a reduced state space representation by compressing the data using an exponential moving average scheme. A regularized DQN agent is deployed to learn an optimal policy, which is then applied in recommending one (or a combination) of six universal principles most likely to trigger responses from users during the next message cycle. In this study, email messaging is used as the vehicle to deliver persuasion principles to the user. At a time of declining click-through rates with marketing emails, business executives continue to show heightened interest in the email channel owing to higher-than-usual return on investment of 42 for every dollar spent when compared to other marketing channels such as social media. Coupled with the state space transformation, our novel regularized Deep Q-learning (DQN) agent was able to train and perform well based on a few observed users’ responses. First, we explored the average positive effect of using persuasion-based messages in a live email marketing campaign, without deploying a learning algorithm to recommend the influence principles. The selection of persuasion tactics was done heuristically, using only domain knowledge. Our results suggest that embedding certain principles of persuasion in campaign emails can significantly increase user engagement for an online business (and have a positive impact on revenues) without putting pressure on marketing or advertising budgets. During the study, the store had a customer retention rate of 76% and sales grew by a half-million dollars from the three field trials combined. The key assumption was that users are predisposed to respond to certain persuasion principles and learning the right principles to incorporate in the message header or body copy would lead to higher response and engagement. With the hypothesis validated, we set forth to build a DQN agent to recommend candidate actions from a catalog of persuasion principles most likely to drive higher engagement in the next messaging cycle. A simulation and a real live campaign are implemented to verify the proposed methodology. The results demonstrate the agent’s superior performance compared to a human expert and a control baseline by a significant margin (~ up to 300%). As the quest for effective methods and tools to maximize user engagement intensifies, our methodology could help to boost user engagement for struggling SMBs without prohibitive increase in costs, by enabling the targeting of messages (with the right persuasion principle) to the right user

    Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation

    Full text link
    Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One popular method for achieving transferability is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap. However, less work has gone into understanding such agents - which are deployed in the real world - beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation. We train agents for Fetch and Jaco robots on a visuomotor control task and evaluate how well they generalise using different testing conditions. Finally, we investigate the internals of the trained agents by using a suite of interpretability techniques. Our results show that the primary outcome of domain randomisation is more robust, entangled representations, accompanied with larger weights with greater spatial structure; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Additionally, we demonstrate that our domain randomised agents require higher sample complexity, can overfit and more heavily rely on recurrent processing. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the combination of inspection tools in order to provide sufficient insights into the behaviour of trained agents
    corecore