53,957 research outputs found
Racial Discrimination Among NBA Referees
The NBA provides an intriguing place to test for taste-based discrimination: referees and players are involved in repeated interactions in a high-pressure setting with referees making the type of split-second decisions that might allow implicit racial biases to manifest themselves. Moreover, the referees receive constant monitoring and feedback on their performance. (Commissioner Stern has claimed that NBA referees "are the most ranked, rated, reviewed, statistically analyzed and mentored group of employees of any company in any place in the world.") The essentially arbitrary assignment of refereeing crews to basketball games, and the number of repeated interactions allow us to convincingly test for own-race preferences. We find -- even conditioning on player and referee fixed effects (and specific game fixed effects) -- that more personal fouls are called against players when they are officiated by an opposite-race refereeing crew than when officiated by an own-race crew. These biases are sufficiently large that we find appreciable differences in whether predominantly black teams are more likely to win or lose, based on the racial composition of the refereeing crew.
SAI, a Sensible Artificial Intelligence that plays Go
We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero
paradigm. The winrate as a function of the komi is modeled with a
two-parameters sigmoid function, so that the neural network must predict just
one more variable to assess the winrate for all komi values. A second novel
feature is that training is based on self-play games that occasionally branch
-- with changed komi -- when the position is uneven. With this setting,
reinforcement learning is showed to work on 7x7 Go, obtaining very strong
playing agents. As a useful byproduct, the sigmoid parameters given by the
network allow to estimate the score difference on the board, and to evaluate
how much the game is decided.Comment: Updated for IJCNN 2019 conferenc
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in
unknown stochastic Markov environments or games. Our aim is to estimate agent
preferences in order to construct improved policies for the same task that the
agents are trying to solve. To do so, we extend previous probabilistic
approaches for inverse reinforcement learning in known MDPs to the case of
unknown dynamics or opponents. We do this by deriving two simplified
probabilistic models of the demonstrator's policy and utility. For
tractability, we use maximum a posteriori estimation rather than full Bayesian
inference. Under a flat prior, this results in a convex optimisation problem.
We find that the resulting algorithms are highly competitive against a variety
of other methods for inverse reinforcement learning that do have knowledge of
the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Reducing Dueling Bandits to Cardinal Bandits
We present algorithms for reducing the Dueling Bandits problem to the
conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits
problem is an online model of learning with ordinal feedback of the form "A is
preferred to B" (as opposed to cardinal feedback like "A has value 2.5"),
giving it wide applicability in learning from implicit user feedback and
revealed and stated preferences. In contrast to existing algorithms for the
Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and
\DoubleSbm -- provide a generic schema for translating the extensive body of
known results about conventional Multi-Armed Bandit algorithms to the Dueling
Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in
both finite and infinite settings, and conjecture about the performance of
\DoubleSbm which empirically outperforms the other two as well as previous
algorithms in our experiments. In addition, we provide the first almost optimal
regret bound in terms of second order terms, such as the differences between
the values of the arms
- …