339 research outputs found

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Bandit Social Learning: Exploration under Myopic Behavior

    Full text link
    We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledg

    Reinforcement learning in large state action spaces

    Get PDF
    Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios. This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory). In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications

    Wardrop Equilibrium Can Be Boundedly Rational: A New Behavioral Theory of Route Choice

    Full text link
    As one of the most fundamental concepts in transportation science, Wardrop equilibrium (WE) has always had a relatively weak behavioral underpinning. To strengthen this foundation, one must reckon with bounded rationality in human decision-making processes, such as the lack of accurate information, limited computing power, and sub-optimal choices. This retreat from behavioral perfectionism in the literature, however, was typically accompanied by a conceptual modification of WE. Here we show that giving up perfect rationality need not force a departure from WE. On the contrary, WE can be reached with global stability in a routing game played by boundedly rational travelers. We achieve this result by developing a day-to-day (DTD) dynamical model that mimics how travelers gradually adjust their route valuations, hence choice probabilities, based on past experiences. Our model, called cumulative logit (CULO), resembles the classical DTD models but makes a crucial change: whereas the classical models assume routes are valued based on the cost averaged over historical data, ours values the routes based on the cost accumulated. To describe route choice behaviors, the CULO model only uses two parameters, one accounting for the rate at which the future route cost is discounted in the valuation relative to the past ones and the other describing the sensitivity of route choice probabilities to valuation differences. We prove that the CULO model always converges to WE, regardless of the initial point, as long as the behavioral parameters satisfy certain mild conditions. Our theory thus upholds WE's role as a benchmark in transportation systems analysis. It also resolves the theoretical challenge posed by Harsanyi's instability problem by explaining why equally good routes at WE are selected with different probabilities

    Context and uncertainty in decisions from experience

    Full text link
    From the moment we wake up each morning, we are faced with countless choices. Should we press snooze on our alarm? Have toast or cereal for breakfast? Bring an umbrella? Agree to work on that new project? Go to the gym or eat a whole pizza while watching Netflix? The challenge when studying decision-making is to collapse these diverse scenarios into feasible experimental methods. The standard theoretical approach is to represent options using outcomes and probabilities and this has provided a rationale for studying decisions using gambling tasks. These tasks typically involve repeated choices between a single pair of options and outcomes that are determined probabilistically. Thus, the two sections in this thesis ask a simple question: are we missing something by using pairs of options that are divorced from the context in which we make choices outside the psychology laboratory? The first section focuses on the impact of extreme outcomes within a decision context. Chapter 2 addresses whether there is a rational explanation for why these outcomes appear in decisions from experience and numerous other cognitive domains. Chapters 3-5 describe six experiments that distinguish between plausible theories based on whether they measure extremity as categorical, ordinal, or continuous; whether extremity refers to the centre, the edges, or neighbouring outcomes; whether outcomes are represented as types or tokens; and whether extreme outcomes are defined using temporal or distributional characteristics. In the second section, we shift our focus to how people perceive uncertainty. We examine a distinction between uncertainty that is attributed to inadequate knowledge and uncertainty that is attributed to an inherently random process. Chapter 6 describes three experiments that examine whether allowing participants to map their uncertainty onto observable variability leads them to perceive it as potentially resolvable rather than purely stochastic. We then examine how this influences whether they seek additional information. In summary, the experiments described in these two sections demonstrate the importance of context and uncertainty in understanding how we make decisions

    Efficient Prior-Free Mechanisms for No-Regret Agents

    Full text link
    We study a repeated Principal Agent problem between a long lived Principal and Agent pair in a prior free setting. In our setting, the sequence of realized states of nature may be adversarially chosen, the Agent is non-myopic, and the Principal aims for a strong form of policy regret. Following Camara, Hartline, and Johnson, we model the Agent's long-run behavior with behavioral assumptions that relax the common prior assumption (for example, that the Agent has no swap regret). Within this framework, we revisit the mechanism proposed by Camara et al., which informally uses calibrated forecasts of the unknown states of nature in place of a common prior. We give two main improvements. First, we give a mechanism that has an exponentially improved dependence (in terms of both running time and regret bounds) on the number of distinct states of nature. To do this, we show that our mechanism does not require truly calibrated forecasts, but rather forecasts that are unbiased subject to only a polynomially sized collection of events -- which can be produced with polynomial overhead. Second, in several important special cases -- including the focal linear contracting setting -- we show how to remove strong ``Alignment'' assumptions (which informally require that near-ties are always broken in favor of the Principal) by specifically deploying ``stable'' policies that do not have any near ties that are payoff relevant to the Principal. Taken together, our new mechanism makes the compelling framework proposed by Camara et al. much more powerful, now able to be realized over polynomially sized state spaces, and while requiring only mild assumptions on Agent behavior

    Hybrid Algorithm Selection and Hyperparameter Tuning on Distributed Machine Learning Resources: A Hierarchical Agent-based Approach

    Full text link
    Algorithm selection and hyperparameter tuning are critical steps in both academic and applied machine learning. On the other hand, these steps are becoming ever increasingly delicate due to the extensive rise in the number, diversity, and distributedness of machine learning resources. Multi-agent systems, when applied to the design of machine learning platforms, bring about several distinctive characteristics such as scalability, flexibility, and robustness, just to name a few. This paper proposes a fully automatic and collaborative agent-based mechanism for selecting distributedly organized machine learning algorithms and simultaneously tuning their hyperparameters. Our method builds upon an existing agent-based hierarchical machine-learning platform and augments its query structure to support the aforementioned functionalities without being limited to specific learning, selection, and tuning mechanisms. We have conducted theoretical assessments, formal verification, and analytical study to demonstrate the correctness, resource utilization, and computational efficiency of our technique. According to the results, our solution is totally correct and exhibits linear time and space complexity in relation to the size of available resources. To provide concrete examples of how the proposed methodologies can effectively adapt and perform across a range of algorithmic options and datasets, we have also conducted a series of experiments using a system comprised of 24 algorithms and 9 datasets
    • …
    corecore