42 research outputs found

    Incentivized Exploration for Multi-Armed Bandits under Reward Drift

    Full text link
    We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of this drifted reward feedback by analyzing the performance of three instantiations of the incentivized MAB algorithm: UCB, ε\varepsilon-Greedy, and Thompson Sampling. Our results show that they all achieve O(logT)\mathcal{O}(\log T) regret and compensation under the drifted reward, and are therefore effective in incentivizing exploration. Numerical examples are provided to complement the theoretical analysis.Comment: 10 pages, 2 figures, AAAI 202

    Towards the Design of Hybrid Intelligence Frontline Service Technologies – A Novel Human-in-the-Loop Configuration for Human-Machine Interactions

    Get PDF
    Rapid adoption of innovative technologies confront IT-Service-Management (ITSM) to incoming support requests of increasing complexity. As a consequence, job demands and turnover rates of ITSM support agents increase. Recent technological advances have introduced assistance systems that rely on hybrid intelligence to provide support agents with contextually suitable historical solutions to help them solve customer requests. Hybrid intelligence systems rely on human input to provide high-quality data to train their underlying AI models. Yet, most agents have little incentives to label their data, lowering data quality and leading to diminishing returns of AI systems due to concept drifts. Following a design science research approach, we provide a novel Human-in-the-Loop design and hybrid intelligence system for ITSM support ticket recommendations, which incentivize agents to provide high-quality labels. Specifically, we leverage agent’s need for instant gratification by simultaneously providing better results if they improve labeling automatically labeled support tickets

    Learning to Price Supply Chain Contracts against a Learning Retailer

    Full text link
    The rise of big data analytics has automated the decision-making of companies and increased supply chain agility. In this paper, we study the supply chain contract design problem faced by a data-driven supplier who needs to respond to the inventory decisions of the downstream retailer. Both the supplier and the retailer are uncertain about the market demand and need to learn about it sequentially. The goal for the supplier is to develop data-driven pricing policies with sublinear regret bounds under a wide range of possible retailer inventory policies for a fixed time horizon. To capture the dynamics induced by the retailer's learning policy, we first make a connection to non-stationary online learning by following the notion of variation budget. The variation budget quantifies the impact of the retailer's learning strategy on the supplier's decision-making. We then propose dynamic pricing policies for the supplier for both discrete and continuous demand. We also note that our proposed pricing policy only requires access to the support of the demand distribution, but critically, does not require the supplier to have any prior knowledge about the retailer's learning policy or the demand realizations. We examine several well-known data-driven policies for the retailer, including sample average approximation, distributionally robust optimization, and parametric approaches, and show that our pricing policies lead to sublinear regret bounds in all these cases. At the managerial level, we answer affirmatively that there is a pricing policy with a sublinear regret bound under a wide range of retailer's learning policies, even though she faces a learning retailer and an unknown demand distribution. Our work also provides a novel perspective in data-driven operations management where the principal has to learn to react to the learning policies employed by other agents in the system

    Distributed Control Approaches to Network Optimization

    Get PDF
    The objective of this research is to develop distributed approaches to optimizing network traffic. Two problems are studied, which include exploiting social networks in routing packets (coupons) to desired network nodes (users in the social network), and developing a rate based transport protocol, which will guarantee that all the flows in a network (e.g. Internet) meet a delay constraint per packet. Firstly, we will study social networks as a means of obtaining information about a system. They are increasingly seen as a means of obtaining awareness of user preferences. Such awareness could be used to target goods and services at them. We consider a general user model, wherein users could buy different numbers of goods at a marked and at a discounted price. Our first objective is to learn which users would be interested in a particular good. Second, we would like to know how much to discount these users such that the entire demand is realized, but not so much that profits are decreased. We develop algorithms for multihop forwarding of such discount coupons over an online social network, in which users forward coupons to each other in return for a reward. Coupling this idea with the implicit learning associated with backpressure routing (originally developed for multihop wireless networks), we would like to demonstrate how to realize optimal revenue. We will then propose a simpler heuristic algorithm and try to show, using simulations, that its performance approaches that of backpressure routing. As the second problem, we look at the traditional formulation of the total value of information transfer, which is a multi-commodity flow problem. Here, each data source is seen as generating a commodity along a fixed route, and the objective is to maximize the total system throughput under some concept of fairness, subject to capacity constraints of the links used. This problem is well studied under the framework of network utility maximization and has led to several different distributed congestion control schemes. However, this idea of value does not capture the fact that flows might associate value, not just with throughput, but with link-quality metrics such as packet delay, jitter and so on. The traditional congestion control problem is redefined to include individual source preferences. It is assumed that degradation in link quality seen by a flow adds up on the links it traverses, and the total utility is maximized in such a way that the quality degradation seen by each source is bounded by a value that it declares. Decoupling source-dissatisfaction and link-degradation through an ?effective capacity? variable, a distributed and provably optimal resource allocation algorithm is designed, to maximize system utility subject to these quality constraints. The applicability of our controller in different situations is illustrated, and results are supported through numerical examples

    Semi-Cooperative Learning in Smart Grid Agents

    Full text link

    Visually-guided timing and its neural representation

    Get PDF
    Stimulus-driven timing is a fundamental aspect of human and animal behavior. This type of timing can be subdivided into three principal axes: interval generation, storage, and evaluation. In this thesis, we present results related to each of these axes and describe their implications for how we understand timed behavior. In Chapter 2, we address interval generation, which is the process of creating an internal representation of an ongoing temporal interval. While several studies have found evidence for neural oscillators which may subserve this function, it has remained an open question whether such a mechanism can be useful for timing at even the lowest level of cortex. To address this question, we analyze electrophysiological data collected from rats performing a timing task and find evidence that, indeed, timed reward-seeking behavior tracks oscillatory states in primary visual cortex. This kind of finding raises an important question: how is this temporal information stored after the interval has been generated? This process is called interval storage, and we address the sources of noise that might corrupt it in Chapter 3. Specifically, we devise a novel timing task for humans (BiCaP) to address whether memory biases can account for performance on a classification task, in which a subject must decide whether a test interval is more similar to one or another reference interval. We find that they do, and argue that these sources of noise must be accounted for in theories of timing. In Chapter 4, we deal with interval evaluation which is the process of using this stored temporal information to make valuation decisions. We study this process through the lens of foraging behavior. Specifically, we develop and test a framework that rationalizes observed spatial search patterns of wild animals and humans by accounting for the temporal information they gather about their environment, and how they discount delayed rewards (temporal discounting). Lastly, in Chapter 5, we discuss how these processes are integrated and the implications of these findings for theories of timing

    Personal accounts: managing households during conflict

    Get PDF
    This thesis examines the impact of political conflict on microfinance engagement to put forth a theory of sparse networks traps. It leverages a natural experiment to distinguish between the effects of conflict on determinants of microfinance efficiency and impact, and includes qualitative evidence from 235 (208 microfinance users and 27 microfinance providers) interviews in the Northeastern Kivu province of the Democratic Republic of Congo. Through a combination of regression analyses and panel data modelling with fixed effects, the research indicates that conflict has a stronger effect on the nature of demand for credit and savings services than it has on the actual performance of financial institutions. By introducing informal financial service providers, including community level rotating savings and credit associations, payday lenders, and moneylenders, the research indicates that the demand for financial services is not greatly reduced during conflict. The reduction in demand reported in the literature is seen in the formal sector, while in the conflict area the demand shifts to the informal sector, resulting in a threefold increase in the likelihood to borrow from an informal source of credit in times of political violence. This shift in user preferences is reflective of an overall decrease in engagement in formal networks and reliance on informal ones, and is reflected in other coping mechanisms such as reduced investment in business creation and increased expenditures in areas that can be considered charitable. The mechanisms by which these choices occur are hyperbolic discounting and reduced trust. In turn, these individual level decisions lead to a sparse networks trap, defined as a fragmentation of the economy into independent enclaves of production and the correlating reduction in interregional interdependence, which may have compounding consequences for post-conflict economic recovery and stability

    Incentivized Exploration for Multi-Armed Bandits under Reward Drift

    No full text

    Parameters Summer 2021

    Get PDF
    The US Army War College Quarterly, Parameters, is a refereed forum for contemporary strategy and Landpower issues. It furthers the education and professional development of senior military officers and members of government and academia concerned with national security affairs
    corecore