4,491 research outputs found
Online Learning for Energy Efficient Navigation in Stochastic Transport Networks
Reducing the dependence on fossil fuels in the transport sector is crucial to have a realistic chance of halting climate change. The automotive industry is, therefore, transitioning towards an electrified future at an unprecedented pace. However, in order for electric vehicles to be an attractive alternative to conventional vehicles, some issues, like range anxiety, need to be mitigated. One way to address these problems is by developing more accurate and robust navigation systems for electric vehicles. Furthermore, with highly stochastic and changing traffic conditions, it is useful to continuously update prior knowledge about the traffic environment by gathering data. Passively collecting energy consumption data from vehicles in the traffic network might lead to insufficient information gathered in places where there are few vehicles. Hence, in this thesis, we study the possibility of adapting the routes presented by the navigation system to adequately explore the road network, and properly learn the underlying energy model.The first part of the thesis introduces an online machine learning framework for navigation of electric vehicles, with the objective of adaptively and efficiently navigating the vehicle in a stochastic traffic environment. We assume that the road-specific probability distributions of vehicle energy consumption are unknown, and thus, we need to learn their parameters through observations. Furthermore, we take a Bayesian approach and assign prior beliefs to the parameters based on longitudinal vehicle dynamics. We view the task as a combinatorial multi-armed bandit problem, and utilize Bayesian bandit algorithms, such as Thompson Sampling, to address it. We establish theoretical performance guarantees for Thompson Sampling, in the form of upper bounds on the Bayesian regret, on single-agent, multi-agent and batched feedback variants of the problem. To demonstrate the effectiveness of the framework, we perform simulation experiments on various real-life road networks.In the second half of the thesis, we extend the online learning framework to find paths which minimize or avoid bottlenecks. Solutions to the online minimax path problem represent risk-averse behaviors, by avoiding road segments with high variance in costs. We derive upper bounds on the Bayesian regret of Thompson Sampling adapted to this problem, by carefully handling the non-linear path cost function. We identify computational tractability issues with the original problem formulation, and propose an alternative approximate objective with an associated algorithm based on Thompson Sampling. Finally, we conduct several experimental studies to evaluate the performance of the approximate algorithm
Recommended from our members
Neural Correlates of Temporal Credit Assignment in the Parietal Lobe
Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the “F” step) but ignore changes in this reward at the remaining step (the “I” step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting
Maximizing User Engagement In Short Marketing Campaigns Within An Online Living Lab: A Reinforcement Learning Perspective
ABSTRACT
MAXIMIZING USER ENGAGEMENT IN SHORT MARKETING CAMPAIGNS WITHIN AN ONLINE LIVING LAB: A REINFORCEMENT LEARNING PERSPECTIVE
by
ANIEKAN MICHAEL INI-ABASI
August 2021
Advisor: Dr. Ratna Babu Chinnam Major: Industrial & Systems Engineering Degree: Doctor of Philosophy
User engagement has emerged as the engine driving online business growth. Many firms have pay incentives tied to engagement and growth metrics. These corporations are turning to recommender systems as the tool of choice in the business of maximizing engagement. LinkedIn reported a 40% higher email response with the introduction of a new recommender system. At Amazon 35% of sales originate from recommendations, while Netflix reports that ‘75% of what people watch is from some sort of recommendation,’ with an estimated business value of 42 for every dollar spent when compared to other marketing channels such as social media.
Coupled with the state space transformation, our novel regularized Deep Q-learning (DQN) agent was able to train and perform well based on a few observed users’ responses. First, we explored the average positive effect of using persuasion-based messages in a live email marketing campaign, without deploying a learning algorithm to recommend the influence principles. The selection of persuasion tactics was done heuristically, using only domain knowledge. Our results suggest that embedding certain principles of persuasion in campaign emails can significantly increase user engagement for an online business (and have a positive impact on revenues) without putting pressure on marketing or advertising budgets. During the study, the store had a customer retention rate of 76% and sales grew by a half-million dollars from the three field trials combined. The key assumption was that users are predisposed to respond to certain persuasion principles and learning the right principles to incorporate in the message header or body copy would lead to higher response and engagement.
With the hypothesis validated, we set forth to build a DQN agent to recommend candidate actions from a catalog of persuasion principles most likely to drive higher engagement in the next messaging cycle. A simulation and a real live campaign are implemented to verify the proposed methodology. The results demonstrate the agent’s superior performance compared to a human expert and a control baseline by a significant margin (~ up to 300%). As the quest for effective methods and tools to maximize user engagement intensifies, our methodology could help to boost user engagement for struggling SMBs without prohibitive increase in costs, by enabling the targeting of messages (with the right persuasion principle) to the right user
Virtual-to-Real-World Transfer Learning for Robots on Wilderness Trails
Robots hold promise in many scenarios involving outdoor use, such as
search-and-rescue, wildlife management, and collecting data to improve
environment, climate, and weather forecasting. However, autonomous navigation
of outdoor trails remains a challenging problem. Recent work has sought to
address this issue using deep learning. Although this approach has achieved
state-of-the-art results, the deep learning paradigm may be limited due to a
reliance on large amounts of annotated training data. Collecting and curating
training datasets may not be feasible or practical in many situations,
especially as trail conditions may change due to seasonal weather variations,
storms, and natural erosion. In this paper, we explore an approach to address
this issue through virtual-to-real-world transfer learning using a variety of
deep learning models trained to classify the direction of a trail in an image.
Our approach utilizes synthetic data gathered from virtual environments for
model training, bypassing the need to collect a large amount of real images of
the outdoors. We validate our approach in three main ways. First, we
demonstrate that our models achieve classification accuracies upwards of 95% on
our synthetic data set. Next, we utilize our classification models in the
control system of a simulated robot to demonstrate feasibility. Finally, we
evaluate our models on real-world trail data and demonstrate the potential of
virtual-to-real-world transfer learning.Comment: iROS 201
A Unified Theory of Dual-Process Control
Dual-process theories play a central role in both psychology and
neuroscience, figuring prominently in fields ranging from executive control to
reward-based learning to judgment and decision making. In each of these
domains, two mechanisms appear to operate concurrently, one relatively high in
computational complexity, the other relatively simple. Why is neural
information processing organized in this way? We propose an answer to this
question based on the notion of compression. The key insight is that
dual-process structure can enhance adaptive behavior by allowing an agent to
minimize the description length of its own behavior. We apply a single model
based on this observation to findings from research on executive control,
reward-based learning, and judgment and decision making, showing that seemingly
diverse dual-process phenomena can be understood as domain-specific
consequences of a single underlying set of computational principles
Remembering Forward: Neural Correlates of Memory and Prediction in Human Motor Adaptation
We used functional MR imaging (FMRI), a robotic manipulandum and systems identification techniques to examine neural correlates of predictive compensation for spring-like loads during goal-directed wrist movements in neurologically-intact humans. Although load changed unpredictably from one trial to the next, subjects nevertheless used sensorimotor memories from recent movements to predict and compensate upcoming loads. Prediction enabled subjects to adapt performance so that the task was accomplished with minimum effort. Population analyses of functional images revealed a distributed, bilateral network of cortical and subcortical activity supporting predictive load compensation during visual target capture. Cortical regions – including prefrontal, parietal and hippocampal cortices – exhibited trial-by-trial fluctuations in BOLD signal consistent with the storage and recall of sensorimotor memories or “states” important for spatial working memory. Bilateral activations in associative regions of the striatum demonstrated temporal correlation with the magnitude of kinematic performance error (a signal that could drive reward-optimizing reinforcement learning and the prospective scaling of previously learned motor programs). BOLD signal correlations with load prediction were observed in the cerebellar cortex and red nuclei (consistent with the idea that these structures generate adaptive fusimotor signals facilitating cancelation of expected proprioceptive feedback, as required for conditional feedback adjustments to ongoing motor commands and feedback error learning). Analysis of single subject images revealed that predictive activity was at least as likely to be observed in more than one of these neural systems as in just one. We conclude therefore that motor adaptation is mediated by predictive compensations supported by multiple, distributed, cortical and subcortical structures
- …