Search CORE

42 research outputs found

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

Author: Chen Lijun
Liu Kai
Liu Zhiyuan
Shen Fan
Wang Huazheng
Publication venue
Publication date: 15/12/2019
Field of study

We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of this drifted reward feedback by analyzing the performance of three instantiations of the incentivized MAB algorithm: UCB,

\varepsilon

-Greedy, and Thompson Sampling. Our results show that they all achieve

\mathcal{O}(\log T)

regret and compensation under the drifted reward, and are therefore effective in incentivizing exploration. Numerical examples are provided to complement the theoretical analysis.Comment: 10 pages, 2 figures, AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Towards the Design of Hybrid Intelligence Frontline Service Technologies – A Novel Human-in-the-Loop Configuration for Human-Machine Interactions

Author: Li Mahei
Löfflad Denise
Oeste-Reiß Sarah
Reh Cornelius
Publication venue
Publication date: 03/01/2023
Field of study

Rapid adoption of innovative technologies confront IT-Service-Management (ITSM) to incoming support requests of increasing complexity. As a consequence, job demands and turnover rates of ITSM support agents increase. Recent technological advances have introduced assistance systems that rely on hybrid intelligence to provide support agents with contextually suitable historical solutions to help them solve customer requests. Hybrid intelligence systems rely on human input to provide high-quality data to train their underlying AI models. Yet, most agents have little incentives to label their data, lowering data quality and leading to diminishing returns of AI systems due to concept drifts. Following a design science research approach, we provide a novel Human-in-the-Loop design and hybrid intelligence system for ITSM support ticket recommendations, which incentivize agents to provide high-quality labels. Specifically, we leverage agent’s need for instant gratification by simultaneously providing better results if they improve labeling automatically labeled support tickets

ScholarSpace at University of Hawai'i at Manoa

Learning to Price Supply Chain Contracts against a Learning Retailer

Author: Haskell William B.
Zhao Xuejun
Zhu Ruihao
Publication venue
Publication date: 02/11/2022
Field of study

The rise of big data analytics has automated the decision-making of companies and increased supply chain agility. In this paper, we study the supply chain contract design problem faced by a data-driven supplier who needs to respond to the inventory decisions of the downstream retailer. Both the supplier and the retailer are uncertain about the market demand and need to learn about it sequentially. The goal for the supplier is to develop data-driven pricing policies with sublinear regret bounds under a wide range of possible retailer inventory policies for a fixed time horizon. To capture the dynamics induced by the retailer's learning policy, we first make a connection to non-stationary online learning by following the notion of variation budget. The variation budget quantifies the impact of the retailer's learning strategy on the supplier's decision-making. We then propose dynamic pricing policies for the supplier for both discrete and continuous demand. We also note that our proposed pricing policy only requires access to the support of the demand distribution, but critically, does not require the supplier to have any prior knowledge about the retailer's learning policy or the demand realizations. We examine several well-known data-driven policies for the retailer, including sample average approximation, distributionally robust optimization, and parametric approaches, and show that our pricing policies lead to sublinear regret bounds in all these cases. At the managerial level, we answer affirmatively that there is a pricing policy with a sublinear regret bound under a wide range of retailer's learning policies, even though she faces a learning retailer and an unknown demand distribution. Our work also provides a novel perspective in data-driven operations management where the principal has to learn to react to the learning policies employed by other agents in the system

arXiv.org e-Print Archive

Distributed Control Approaches to Network Optimization

Author: Sah Sankalp
Publication venue
Publication date
Field of study

The objective of this research is to develop distributed approaches to optimizing network traffic. Two problems are studied, which include exploiting social networks in routing packets (coupons) to desired network nodes (users in the social network), and developing a rate based transport protocol, which will guarantee that all the flows in a network (e.g. Internet) meet a delay constraint per packet. Firstly, we will study social networks as a means of obtaining information about a system. They are increasingly seen as a means of obtaining awareness of user preferences. Such awareness could be used to target goods and services at them. We consider a general user model, wherein users could buy different numbers of goods at a marked and at a discounted price. Our first objective is to learn which users would be interested in a particular good. Second, we would like to know how much to discount these users such that the entire demand is realized, but not so much that profits are decreased. We develop algorithms for multihop forwarding of such discount coupons over an online social network, in which users forward coupons to each other in return for a reward. Coupling this idea with the implicit learning associated with backpressure routing (originally developed for multihop wireless networks), we would like to demonstrate how to realize optimal revenue. We will then propose a simpler heuristic algorithm and try to show, using simulations, that its performance approaches that of backpressure routing. As the second problem, we look at the traditional formulation of the total value of information transfer, which is a multi-commodity flow problem. Here, each data source is seen as generating a commodity along a fixed route, and the objective is to maximize the total system throughput under some concept of fairness, subject to capacity constraints of the links used. This problem is well studied under the framework of network utility maximization and has led to several different distributed congestion control schemes. However, this idea of value does not capture the fact that flows might associate value, not just with throughput, but with link-quality metrics such as packet delay, jitter and so on. The traditional congestion control problem is redefined to include individual source preferences. It is assumed that degradation in link quality seen by a flow adds up on the links it traverses, and the total utility is maximized in such a way that the quality degradation seen by each source is bounded by a value that it declares. Decoupling source-dissatisfaction and link-degradation through an ?effective capacity? variable, a distributed and provably optimal resource allocation algorithm is designed, to maximize system utility subject to these quality constraints. The applicability of our controller in different situations is illustrated, and results are supported through numerical examples

Texas A&M Repository

Semi-Cooperative Learning in Smart Grid Agents

Author
Publication venue: 'Defense Technical Information Center (DTIC)'
Publication date
Field of study

Crossref

Visually-guided timing and its neural representation

Author: Levy Joshua
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 22/05/2018
Field of study

Stimulus-driven timing is a fundamental aspect of human and animal behavior. This type of timing can be subdivided into three principal axes: interval generation, storage, and evaluation. In this thesis, we present results related to each of these axes and describe their implications for how we understand timed behavior. In Chapter 2, we address interval generation, which is the process of creating an internal representation of an ongoing temporal interval. While several studies have found evidence for neural oscillators which may subserve this function, it has remained an open question whether such a mechanism can be useful for timing at even the lowest level of cortex. To address this question, we analyze electrophysiological data collected from rats performing a timing task and find evidence that, indeed, timed reward-seeking behavior tracks oscillatory states in primary visual cortex. This kind of finding raises an important question: how is this temporal information stored after the interval has been generated? This process is called interval storage, and we address the sources of noise that might corrupt it in Chapter 3. Specifically, we devise a novel timing task for humans (BiCaP) to address whether memory biases can account for performance on a classification task, in which a subject must decide whether a test interval is more similar to one or another reference interval. We find that they do, and argue that these sources of noise must be accounted for in theories of timing. In Chapter 4, we deal with interval evaluation which is the process of using this stored temporal information to make valuation decisions. We study this process through the lens of foraging behavior. Specifically, we develop and test a framework that rationalizes observed spatial search patterns of wild animals and humans by accounting for the temporal information they gather about their environment, and how they discount delayed rewards (temporal discounting). Lastly, in Chapter 5, we discuss how these processes are integrated and the implications of these findings for theories of timing

JScholarship

Personal accounts: managing households during conflict

Author: Smith Julia
Publication venue: University of York
Publication date: 01/12/2016
Field of study

This thesis examines the impact of political conflict on microfinance engagement to put forth a theory of sparse networks traps. It leverages a natural experiment to distinguish between the effects of conflict on determinants of microfinance efficiency and impact, and includes qualitative evidence from 235 (208 microfinance users and 27 microfinance providers) interviews in the Northeastern Kivu province of the Democratic Republic of Congo. Through a combination of regression analyses and panel data modelling with fixed effects, the research indicates that conflict has a stronger effect on the nature of demand for credit and savings services than it has on the actual performance of financial institutions. By introducing informal financial service providers, including community level rotating savings and credit associations, payday lenders, and moneylenders, the research indicates that the demand for financial services is not greatly reduced during conflict. The reduction in demand reported in the literature is seen in the formal sector, while in the conflict area the demand shifts to the informal sector, resulting in a threefold increase in the likelihood to borrow from an informal source of credit in times of political violence. This shift in user preferences is reflective of an overall decrease in engagement in formal networks and reliance on informal ones, and is reflected in other coping mechanisms such as reduced investment in business creation and increased expenditures in areas that can be considered charitable. The mechanisms by which these choices occur are hyperbolic discounting and reduced trust. In turn, these individual level decisions lead to a sparse networks trap, defined as a fragmentation of the economy into independent enclaves of production and the correlating reduction in interregional interdependence, which may have compounding consequences for post-conflict economic recovery and stability

White Rose E-theses Online

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

Author
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date
Field of study

Crossref

Parameters Summer 2021

Author: Press USAWC
Publication venue: USAWC Press
Publication date: 18/05/2021
Field of study

The US Army War College Quarterly, Parameters, is a refereed forum for contemporary strategy and Landpower issues. It furthers the education and professional development of senior military officers and members of government and academia concerned with national security affairs

US Army War College Press (USAWC)