Search CORE

4,491 research outputs found

Online Learning for Energy Efficient Navigation in Stochastic Transport Networks

Author: \uc5kerblom Niklas
Publication venue
Publication date: 01/01/2021
Field of study

Reducing the dependence on fossil fuels in the transport sector is crucial to have a realistic chance of halting climate change. The automotive industry is, therefore, transitioning towards an electrified future at an unprecedented pace. However, in order for electric vehicles to be an attractive alternative to conventional vehicles, some issues, like range anxiety, need to be mitigated. One way to address these problems is by developing more accurate and robust navigation systems for electric vehicles. Furthermore, with highly stochastic and changing traffic conditions, it is useful to continuously update prior knowledge about the traffic environment by gathering data. Passively collecting energy consumption data from vehicles in the traffic network might lead to insufficient information gathered in places where there are few vehicles. Hence, in this thesis, we study the possibility of adapting the routes presented by the navigation system to adequately explore the road network, and properly learn the underlying energy model.The first part of the thesis introduces an online machine learning framework for navigation of electric vehicles, with the objective of adaptively and efficiently navigating the vehicle in a stochastic traffic environment. We assume that the road-specific probability distributions of vehicle energy consumption are unknown, and thus, we need to learn their parameters through observations. Furthermore, we take a Bayesian approach and assign prior beliefs to the parameters based on longitudinal vehicle dynamics. We view the task as a combinatorial multi-armed bandit problem, and utilize Bayesian bandit algorithms, such as Thompson Sampling, to address it. We establish theoretical performance guarantees for Thompson Sampling, in the form of upper bounds on the Bayesian regret, on single-agent, multi-agent and batched feedback variants of the problem. To demonstrate the effectiveness of the framework, we perform simulation experiments on various real-life road networks.In the second half of the thesis, we extend the online learning framework to find paths which minimize or avoid bottlenecks. Solutions to the online minimax path problem represent risk-averse behaviors, by avoiding road segments with high variance in costs. We derive upper bounds on the Bayesian regret of Thompson Sampling adapted to this problem, by carefully handling the non-linear path cost function. We identify computational tractability issues with the original problem formulation, and propose an alternative approximate objective with an associated algorithm based on Thompson Sampling. Finally, we conduct several experimental studies to evaluate the performance of the approximate algorithm

Chalmers Research

Recommended from our members

Neural Correlates of Temporal Credit Assignment in the Parietal Lobe

Author: Eisenberg Ian
Foley Nicholas C.
Gersch Timothy M.
Gottlieb Jacqueline
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the “F” step) but ignore changes in this reward at the remaining step (the “I” step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

FigShare

Maximizing User Engagement In Short Marketing Campaigns Within An Online Living Lab: A Reinforcement Learning Perspective

Author: Ini-Abasi Aniekan Michael
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2021
Field of study

ABSTRACT MAXIMIZING USER ENGAGEMENT IN SHORT MARKETING CAMPAIGNS WITHIN AN ONLINE LIVING LAB: A REINFORCEMENT LEARNING PERSPECTIVE by ANIEKAN MICHAEL INI-ABASI August 2021 Advisor: Dr. Ratna Babu Chinnam Major: Industrial & Systems Engineering Degree: Doctor of Philosophy User engagement has emerged as the engine driving online business growth. Many firms have pay incentives tied to engagement and growth metrics. These corporations are turning to recommender systems as the tool of choice in the business of maximizing engagement. LinkedIn reported a 40% higher email response with the introduction of a new recommender system. At Amazon 35% of sales originate from recommendations, while Netflix reports that ‘75% of what people watch is from some sort of recommendation,’ with an estimated business value of

1 billion per year. While the leading companies have been quite successful at harnessing the power of recommenders to boost user engagement across the digital ecosystem, small and medium businesses (SMB) are struggling with declining engagement across many channels as competition for user attention intensifies. The SMBs often lack the technical expertise and big data infrastructure necessary to operationalize recommender systems. The purpose of this study is to explore the methods of building a learning agent that can be used to personalize a persuasive request to maximize user engagement in a data-efficient setting. We frame the task as a sequential decision-making problem, modelled as MDP, and solved using a generalized reinforcement learning (RL) algorithm. We leverage an approach that eliminates or at least greatly reduces the need for massive amounts of training data, thus moving away from a purely data-driven approach. By incorporating domain knowledge from the literature on persuasion into the message composition, we are able to train the RL agent in a sample efficient and operant manner. In our methodology, the RL agent nominates a candidate from a catalog of persuasion principles to drive higher user response and engagement. To enable the effective use of RL in our specific setting, we first build a reduced state space representation by compressing the data using an exponential moving average scheme. A regularized DQN agent is deployed to learn an optimal policy, which is then applied in recommending one (or a combination) of six universal principles most likely to trigger responses from users during the next message cycle. In this study, email messaging is used as the vehicle to deliver persuasion principles to the user. At a time of declining click-through rates with marketing emails, business executives continue to show heightened interest in the email channel owing to higher-than-usual return on investment of

42 for every dollar spent when compared to other marketing channels such as social media. Coupled with the state space transformation, our novel regularized Deep Q-learning (DQN) agent was able to train and perform well based on a few observed users’ responses. First, we explored the average positive effect of using persuasion-based messages in a live email marketing campaign, without deploying a learning algorithm to recommend the influence principles. The selection of persuasion tactics was done heuristically, using only domain knowledge. Our results suggest that embedding certain principles of persuasion in campaign emails can significantly increase user engagement for an online business (and have a positive impact on revenues) without putting pressure on marketing or advertising budgets. During the study, the store had a customer retention rate of 76% and sales grew by a half-million dollars from the three field trials combined. The key assumption was that users are predisposed to respond to certain persuasion principles and learning the right principles to incorporate in the message header or body copy would lead to higher response and engagement. With the hypothesis validated, we set forth to build a DQN agent to recommend candidate actions from a catalog of persuasion principles most likely to drive higher engagement in the next messaging cycle. A simulation and a real live campaign are implemented to verify the proposed methodology. The results demonstrate the agent’s superior performance compared to a human expert and a control baseline by a significant margin (~ up to 300%). As the quest for effective methods and tools to maximize user engagement intensifies, our methodology could help to boost user engagement for struggling SMBs without prohibitive increase in costs, by enabling the targeting of messages (with the right persuasion principle) to the right user

Digital Commons@Wayne State University

Virtual-to-Real-World Transfer Learning for Robots on Wilderness Trails

Author: Iuzzolino Michael L.
Szafir Daniel
Walker Michael E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/01/2019
Field of study

Robots hold promise in many scenarios involving outdoor use, such as search-and-rescue, wildlife management, and collecting data to improve environment, climate, and weather forecasting. However, autonomous navigation of outdoor trails remains a challenging problem. Recent work has sought to address this issue using deep learning. Although this approach has achieved state-of-the-art results, the deep learning paradigm may be limited due to a reliance on large amounts of annotated training data. Collecting and curating training datasets may not be feasible or practical in many situations, especially as trail conditions may change due to seasonal weather variations, storms, and natural erosion. In this paper, we explore an approach to address this issue through virtual-to-real-world transfer learning using a variety of deep learning models trained to classify the direction of a trail in an image. Our approach utilizes synthetic data gathered from virtual environments for model training, bypassing the need to collect a large amount of real images of the outdoors. We validate our approach in three main ways. First, we demonstrate that our models achieve classification accuracies upwards of 95% on our synthetic data set. Next, we utilize our classification models in the control system of a simulated robot to demonstrate feasibility. Finally, we evaluate our models on real-world trail data and demonstrate the potential of virtual-to-real-world transfer learning.Comment: iROS 201

arXiv.org e-Print Archive

Crossref

A Unified Theory of Dual-Process Control

Author: Botvinick Matthew M.
Miller Kevin
Moskovitz Ted
Sahani Maneesh
Publication venue
Publication date: 10/10/2023
Field of study

Dual-process theories play a central role in both psychology and neuroscience, figuring prominently in fields ranging from executive control to reward-based learning to judgment and decision making. In each of these domains, two mechanisms appear to operate concurrently, one relatively high in computational complexity, the other relatively simple. Why is neural information processing organized in this way? We propose an answer to this question based on the notion of compression. The key insight is that dual-process structure can enhance adaptive behavior by allowing an agent to minimize the description length of its own behavior. We apply a single model based on this observation to findings from research on executive control, reward-based learning, and judgment and decision making, showing that seemingly diverse dual-process phenomena can be understood as domain-specific consequences of a single underlying set of computational principles

arXiv.org e-Print Archive

Remembering Forward: Neural Correlates of Memory and Prediction in Human Motor Adaptation

Author: Houk James
Mosier Kristine M.
Salowitz Nicole M.G.
Scheidt Robert A.
Simo Lucia
Suminski Aaron J.
Zimbelman Janice
Publication venue: e-Publications@Marquette
Publication date: 01/01/2012
Field of study

We used functional MR imaging (FMRI), a robotic manipulandum and systems identification techniques to examine neural correlates of predictive compensation for spring-like loads during goal-directed wrist movements in neurologically-intact humans. Although load changed unpredictably from one trial to the next, subjects nevertheless used sensorimotor memories from recent movements to predict and compensate upcoming loads. Prediction enabled subjects to adapt performance so that the task was accomplished with minimum effort. Population analyses of functional images revealed a distributed, bilateral network of cortical and subcortical activity supporting predictive load compensation during visual target capture. Cortical regions – including prefrontal, parietal and hippocampal cortices – exhibited trial-by-trial fluctuations in BOLD signal consistent with the storage and recall of sensorimotor memories or “states” important for spatial working memory. Bilateral activations in associative regions of the striatum demonstrated temporal correlation with the magnitude of kinematic performance error (a signal that could drive reward-optimizing reinforcement learning and the prospective scaling of previously learned motor programs). BOLD signal correlations with load prediction were observed in the cerebellar cortex and red nuclei (consistent with the idea that these structures generate adaptive fusimotor signals facilitating cancelation of expected proprioceptive feedback, as required for conditional feedback adjustments to ongoing motor commands and feedback error learning). Analysis of single subject images revealed that predictive activity was at least as likely to be observed in more than one of these neural systems as in just one. We conclude therefore that motor adaptation is mediated by predictive compensations supported by multiple, distributed, cortical and subcortical structures

epublications@Marquette

PubMed Central