53,957 research outputs found

    Playing the Large Margin Preference Game

    Get PDF

    Racial Discrimination Among NBA Referees

    Get PDF
    The NBA provides an intriguing place to test for taste-based discrimination: referees and players are involved in repeated interactions in a high-pressure setting with referees making the type of split-second decisions that might allow implicit racial biases to manifest themselves. Moreover, the referees receive constant monitoring and feedback on their performance. (Commissioner Stern has claimed that NBA referees "are the most ranked, rated, reviewed, statistically analyzed and mentored group of employees of any company in any place in the world.") The essentially arbitrary assignment of refereeing crews to basketball games, and the number of repeated interactions allow us to convincingly test for own-race preferences. We find -- even conditioning on player and referee fixed effects (and specific game fixed effects) -- that more personal fouls are called against players when they are officiated by an opposite-race refereeing crew than when officiated by an own-race crew. These biases are sufficiently large that we find appreciable differences in whether predominantly black teams are more likely to win or lose, based on the racial composition of the refereeing crew.

    SAI, a Sensible Artificial Intelligence that plays Go

    Full text link
    We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero paradigm. The winrate as a function of the komi is modeled with a two-parameters sigmoid function, so that the neural network must predict just one more variable to assess the winrate for all komi values. A second novel feature is that training is based on self-play games that occasionally branch -- with changed komi -- when the position is uneven. With this setting, reinforcement learning is showed to work on 7x7 Go, obtaining very strong playing agents. As a useful byproduct, the sigmoid parameters given by the network allow to estimate the score difference on the board, and to evaluate how much the game is decided.Comment: Updated for IJCNN 2019 conferenc

    Probabilistic inverse reinforcement learning in unknown environments

    Full text link
    We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    Reducing Dueling Bandits to Cardinal Bandits

    Full text link
    We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and \DoubleSbm -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms
    • …
    corecore