42 research outputs found
Incentivized Exploration for Multi-Armed Bandits under Reward Drift
We study incentivized exploration for the multi-armed bandit (MAB) problem
where the players receive compensation for exploring arms other than the greedy
choice and may provide biased feedback on reward. We seek to understand the
impact of this drifted reward feedback by analyzing the performance of three
instantiations of the incentivized MAB algorithm: UCB, -Greedy,
and Thompson Sampling. Our results show that they all achieve regret and compensation under the drifted reward, and are therefore
effective in incentivizing exploration. Numerical examples are provided to
complement the theoretical analysis.Comment: 10 pages, 2 figures, AAAI 202
Towards the Design of Hybrid Intelligence Frontline Service Technologies – A Novel Human-in-the-Loop Configuration for Human-Machine Interactions
Rapid adoption of innovative technologies confront IT-Service-Management (ITSM) to incoming support requests of increasing complexity. As a consequence, job demands and turnover rates of ITSM support agents increase. Recent technological advances have introduced assistance systems that rely on hybrid intelligence to provide support agents with contextually suitable historical solutions to help them solve customer requests. Hybrid intelligence systems rely on human input to provide high-quality data to train their underlying AI models. Yet, most agents have little incentives to label their data, lowering data quality and leading to diminishing returns of AI systems due to concept drifts. Following a design science research approach, we provide a novel Human-in-the-Loop design and hybrid intelligence system for ITSM support ticket recommendations, which incentivize agents to provide high-quality labels. Specifically, we leverage agent’s need for instant gratification by simultaneously providing better results if they improve labeling automatically labeled support tickets
Learning to Price Supply Chain Contracts against a Learning Retailer
The rise of big data analytics has automated the decision-making of companies
and increased supply chain agility. In this paper, we study the supply chain
contract design problem faced by a data-driven supplier who needs to respond to
the inventory decisions of the downstream retailer. Both the supplier and the
retailer are uncertain about the market demand and need to learn about it
sequentially. The goal for the supplier is to develop data-driven pricing
policies with sublinear regret bounds under a wide range of possible retailer
inventory policies for a fixed time horizon.
To capture the dynamics induced by the retailer's learning policy, we first
make a connection to non-stationary online learning by following the notion of
variation budget. The variation budget quantifies the impact of the retailer's
learning strategy on the supplier's decision-making. We then propose dynamic
pricing policies for the supplier for both discrete and continuous demand. We
also note that our proposed pricing policy only requires access to the support
of the demand distribution, but critically, does not require the supplier to
have any prior knowledge about the retailer's learning policy or the demand
realizations. We examine several well-known data-driven policies for the
retailer, including sample average approximation, distributionally robust
optimization, and parametric approaches, and show that our pricing policies
lead to sublinear regret bounds in all these cases.
At the managerial level, we answer affirmatively that there is a pricing
policy with a sublinear regret bound under a wide range of retailer's learning
policies, even though she faces a learning retailer and an unknown demand
distribution. Our work also provides a novel perspective in data-driven
operations management where the principal has to learn to react to the learning
policies employed by other agents in the system
Distributed Control Approaches to Network Optimization
The objective of this research is to develop distributed approaches to optimizing
network traffic. Two problems are studied, which include exploiting social networks
in routing packets (coupons) to desired network nodes (users in the social network),
and developing a rate based transport protocol, which will guarantee that all the
flows in a network (e.g. Internet) meet a delay constraint per packet.
Firstly, we will study social networks as a means of obtaining information about
a system. They are increasingly seen as a means of obtaining awareness of user preferences.
Such awareness could be used to target goods and services at them. We
consider a general user model, wherein users could buy different numbers of goods
at a marked and at a discounted price. Our first objective is to learn which users
would be interested in a particular good. Second, we would like to know how much
to discount these users such that the entire demand is realized, but not so much that
profits are decreased. We develop algorithms for multihop forwarding of such discount
coupons over an online social network, in which users forward coupons to each
other in return for a reward. Coupling this idea with the implicit learning associated
with backpressure routing (originally developed for multihop wireless networks), we
would like to demonstrate how to realize optimal revenue. We will then propose a
simpler heuristic algorithm and try to show, using simulations, that its performance
approaches that of backpressure routing.
As the second problem, we look at the traditional formulation of the total value
of information transfer, which is a multi-commodity flow problem. Here, each data source is seen as generating a commodity along a fixed route, and the objective is
to maximize the total system throughput under some concept of fairness, subject
to capacity constraints of the links used. This problem is well studied under the
framework of network utility maximization and has led to several different distributed
congestion control schemes. However, this idea of value does not capture the fact that
flows might associate value, not just with throughput, but with link-quality metrics
such as packet delay, jitter and so on. The traditional congestion control problem is
redefined to include individual source preferences. It is assumed that degradation in
link quality seen by a flow adds up on the links it traverses, and the total utility is
maximized in such a way that the quality degradation seen by each source is bounded
by a value that it declares. Decoupling source-dissatisfaction and link-degradation
through an ?effective capacity? variable, a distributed and provably optimal resource
allocation algorithm is designed, to maximize system utility subject to these quality
constraints. The applicability of our controller in different situations is illustrated,
and results are supported through numerical examples
Visually-guided timing and its neural representation
Stimulus-driven timing is a fundamental aspect of human and animal behavior. This type of timing can be subdivided into three principal axes: interval generation, storage, and evaluation. In this thesis, we present results related to each of these axes and describe their implications for how we understand timed behavior. In Chapter 2, we address interval generation, which is the process of creating an internal representation of an ongoing temporal interval. While several studies have found evidence for neural oscillators which may subserve this function, it has remained an open question whether such a mechanism can be useful for timing at even the lowest level of cortex. To address this question, we analyze electrophysiological data collected from rats performing a timing task and find evidence that, indeed, timed reward-seeking behavior tracks oscillatory states in primary visual cortex. This kind of finding raises an important question: how is this temporal information stored after the interval has been generated? This process is called interval storage, and we address the sources of noise that might corrupt it in Chapter 3. Specifically, we devise a novel timing task for humans (BiCaP) to address whether memory biases can account for performance on a classification task, in which a subject must decide whether a test interval is more similar to one or another reference interval. We find that they do, and argue that these sources of noise must be accounted for in theories of timing. In Chapter 4, we deal with interval evaluation which is the process of using this stored temporal information to make valuation decisions. We study this process through the lens of foraging behavior. Specifically, we develop and test a framework that rationalizes observed spatial search patterns of wild animals and humans by accounting for the temporal information they gather about their environment, and how they discount delayed rewards (temporal discounting). Lastly, in Chapter 5, we discuss how these processes are integrated and the implications of these findings for theories of timing
Personal accounts: managing households during conflict
This thesis examines the impact of political conflict on microfinance engagement to put forth a theory of sparse networks traps. It leverages a natural experiment to distinguish between the effects of conflict on determinants of microfinance efficiency and impact, and includes qualitative evidence from 235 (208 microfinance users and 27 microfinance providers) interviews in the Northeastern Kivu province of the Democratic Republic of Congo. Through a combination of regression analyses and panel data modelling with fixed effects, the research indicates that conflict has a stronger effect on the nature of demand for credit and savings services than it has on the actual performance of financial institutions. By introducing informal financial service providers, including community level rotating savings and credit associations, payday lenders, and moneylenders, the research indicates that the demand for financial services is not greatly reduced during conflict. The reduction in demand reported in the literature is seen in the formal sector, while in the conflict area the demand shifts to the informal sector, resulting in a threefold increase in the likelihood to borrow from an informal source of credit in times of political violence. This shift in user preferences is reflective of an overall decrease in engagement in formal networks and reliance on informal ones, and is reflected in other coping mechanisms such as reduced investment in business creation and increased expenditures in areas that can be considered charitable. The mechanisms by which these choices occur are hyperbolic discounting and reduced trust. In turn, these individual level decisions lead to a sparse networks trap, defined as a fragmentation of the economy into independent enclaves of production and the correlating reduction in interregional interdependence, which may have compounding consequences for post-conflict economic recovery and stability
Parameters Summer 2021
The US Army War College Quarterly, Parameters, is a refereed forum for contemporary strategy and Landpower issues. It furthers the education and professional development of senior military officers and members of government and academia concerned with national security affairs