70 research outputs found

    Risk-averse multi-armed bandits and game theory

    Get PDF
    The multi-armed bandit (MAB) and game theory literature is mainly focused on the expected cumulative reward and the expected payoffs in a game, respectively. In contrast, the rewards and the payoffs are often random variables whose expected values only capture a vague idea of the overall distribution. The focus of this dissertation is to study the fundamental limits of the existing bandits and game theory problems in a risk-averse framework and propose new ideas that address the shortcomings. The author believes that human beings are mostly risk-averse, so studying multi-armed bandits and game theory from the point of view of risk aversion, rather than expected reward/payoff, better captures reality. In this manner, a specific class of multi-armed bandits, called explore-then-commit bandits, and stochastic games are studied in this dissertation, which are based on the notion of Risk-Averse Best Action Decision with Incomplete Information (R-ABADI, Abadi is the maiden name of the author's mother). The goal of the classical multi-armed bandits is to exploit the arm with the maximum score defined as the expected value of the arm reward. Instead, we propose a new definition of score that is derived from the joint distribution of all arm rewards and captures the reward of an arm relative to those of all other arms. We use a similar idea for games and propose a risk-averse R-ABADI equilibrium in game theory that is possibly different from the Nash equilibrium. The payoff distributions are taken into account to derive the risk-averse equilibrium, while the expected payoffs are used to find the Nash equilibrium. The fundamental properties of games, e.g. pure and mixed risk-averse R-ABADI equilibrium and strict dominance, are studied in the new framework and the results are expanded to finite-time games. Furthermore, the stochastic congestion games are studied from a risk-averse perspective and three classes of equilibria are proposed for such games. It is shown by examples that the risk-averse behavior of travelers in a stochastic congestion game can improve the price of anarchy in Pigou and Braess networks. Furthermore, the Braess paradox does not occur to the extent proposed originally when travelers are risk-averse. We also study an online affinity scheduling problem with no prior knowledge of the task arrival rates and processing rates of different task types on different servers. We propose the Blind GB-PANDAS algorithm that utilizes an exploration-exploitation scheme to load balance incoming tasks on servers in an online fashion. We prove that Blind GB-PANDAS is throughput optimal, i.e. it stabilizes the system as long as the task arrival rates are inside the capacity region. The Blind GB-PANDAS algorithm is compared to FCFS, Max-Weight, and c-mu-rule algorithms in terms of average task completion time through simulations, where the same exploration-exploitation approach as Blind GB-PANDAS is used for Max-Weight and c-μ\mu-rule. The extensive simulations show that the Blind GB-PANDAS algorithm conspicuously outperforms the three other algorithms at high loads

    Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

    Full text link
    Motivated by practical considerations in machine learning for financial decision-making, such as risk-aversion and large action space, we initiate the study of risk-aware linear bandits. Specifically, we consider regret minimization under the mean-variance measure when facing a set of actions whose rewards can be expressed as linear functions of (initially) unknown parameters. Driven by the variance-minimizing G-optimal design, we propose the Risk-Aware Explore-then-Commit (RISE) algorithm and the Risk-Aware Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their regret upper bounds to show that, by leveraging the linear structure, the algorithms can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the algorithms by conducting extensive numerical experiments in a synthetic smart order routing setup. Our results show that both RISE and RISE++ can outperform the competing methods, especially in complex decision-making scenarios

    Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

    Full text link
    This paper considers a stochastic multi-armed bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of TT consecutive rounds. Though each objective has been individually well-studied, i.e., best arm identification for (i) and regret minimization for (ii), the simultaneous realization of both objectives remains an open problem, despite its practical importance. This paper introduces \emph{Regret Optimal Best Arm Identification} (ROBAI) which aims to achieve these dual objectives. To solve ROBAI with both pre-determined stopping time and adaptive stopping time requirements, we present the EOCP\mathsf{EOCP} algorithm and its variants respectively, which not only achieve asymptotic optimal regret in both Gaussian and general bandits, but also commit to the optimal arm in O(logT)\mathcal{O}(\log T) rounds with pre-determined stopping time and O(log2T)\mathcal{O}(\log^2 T) rounds with adaptive stopping time. We further characterize lower bounds on the commitment time (equivalent to sample complexity) of ROBAI, showing that EOCP\mathsf{EOCP} and its variants are sample optimal with pre-determined stopping time, and almost sample optimal with adaptive stopping time. Numerical results confirm our theoretical analysis and reveal an interesting ``over-exploration'' phenomenon carried by classic UCB\mathsf{UCB} algorithms, such that EOCP\mathsf{EOCP} has smaller regret even though it stops exploration much earlier than UCB\mathsf{UCB} (O(logT)\mathcal{O}(\log T) versus O(T)\mathcal{O}(T)), which suggests over-exploration is unnecessary and potentially harmful to system performance

    A Novel Approach to the Behavioral Aspects of Cybersecurity

    Full text link
    The Internet and cyberspace are inseparable aspects of everyone's life. Cyberspace is a concept that describes widespread, interconnected, and online digital technology. Cyberspace refers to the online world that is separate from everyday reality. Since the internet is a recent advance in human lives, there are many unknown and unpredictable aspects to it that sometimes can be catastrophic to users in financial aspects, high-tech industry, and healthcare. Cybersecurity failures are usually caused by human errors or their lack of knowledge. According to the International Business Machines Corporation (IBM) X-Force Threat Intelligence Index in 2020, around 8.5 billion records were compromised in 2019 due to failures of insiders, which is an increase of more than 200 percent compared to the compromised records in 2018. In another survey performed by the Ernst and Young Global Information Security during 2018-2019, it is reported that 34% of the organizations stated that employees who are inattentive or do not have the necessary knowledge are the principal vulnerabilities of cybersecurity, and 22% of the organizations indicated that phishing is the main threat to them. Inattentive users are one of the reasons for data breaches and cyberattacks. The National Cyber Security Centre (NCSC) in the United Kingdom observed that 23.2 million users who were victims of cybersecurity attacks used a carelessly selected password, which is 123456, as their account password. The Annual Cybersecurity Report published by Cisco in 2018 announced that phishing and spear phishing emails are the root causes of many cybersecurity attacks in recent years. Hence, enhancing the cybersecurity behaviors of both personal users and organizations can protect vulnerable users from cyber threats. Both human factors and technological aspects of cybersecurity should be addressed in organizations for a safer environment

    Developing Hybrid Machine Learning Models to Assign Health Score to Railcar Fleets for Optimal Decision Making

    Full text link
    A large amount of data is generated during the operation of a railcar fleet, which can easily lead to dimensional disaster and reduce the resiliency of the railcar network. To solve these issues and offer predictive maintenance, this research introduces a hybrid fault diagnosis expert system method that combines density-based spatial clustering of applications with noise (DBSCAN) and principal component analysis (PCA). Firstly, the DBSCAN method is used to cluster categorical data that are similar to one another within the same group. Secondly, PCA algorithm is applied to reduce the dimensionality of the data and eliminate redundancy in order to improve the accuracy of fault diagnosis. Finally, we explain the engineered features and evaluate the selected models by using the Gain Chart and Area Under Curve (AUC) metrics. We use the hybrid expert system model to enhance maintenance planning decisions by assigning a health score to the railcar system of the North American Railcar Owner (NARO). According to the experimental results, our expert model can detect 96.4% of failures within 50% of the sample. This suggests that our method is effective at diagnosing failures in railcars fleet.Comment: 21 pages, 7 figures, 3 table

    Entrepreneurial Operations Management

    Full text link
    In the presence of tight capital, time and talent constraints, many traditional operational challenges are reinforced (and sometimes redefined) in the entrepreneurial setting. This dissertation addresses some of these challenges by examining theoretically and experimentally several problems in entrepreneurship and innovation for which the existing literature offers little guidance. The dissertation is organized into three chapters. When tight time-to-market constraints are binding an important question in product development is how much time a development team should spend on generating new ideas and designs vs executing the idea, and who should make that decision. In the first chapter of this dissertation I develop an experimental approach to examining this question. Entrepreneurial ventures can have limited (often zero) cash inflow and limited access to capital, and so use equity ownership to compensate founders and early employees. In the second chapter I focus on the challenges of equity-based incentive design, examining the effects of contract form (equal vs non-equal equity splits) and time (upfront vs. delayed contracting) on effort and value generation in startups. In "technology-push" (relative to "demand-pull") innovation, technology teams often develop a new capability that may find voice in a wide range of industrial settings. However, the team may lack the appropriate marketing budget to explore each in great depth, or even all of them at any depth. In the third chapter I study entrepreneurial market identification, developing and testing search strategies for choosing a market for a new technology when the number of potential markets is large but the search budget is small.PHDBusiness AdministrationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145946/1/ekagan_1.pd
    corecore