2,407 research outputs found

    Gentlemen, stop your engines!

    Get PDF
    For fifty years, computer chess has pursued an original goal of Artificial Intelligence, to produce a chess-engine to compete at the highest level. The goal has arguably been achieved, but that success has made it harder to answer questions about the relative playing strengths of man and machine. The proposal here is to approach such questions in a counter-intuitive way, handicapping or stopping-down chess engines so that they play less well. The intrinsic lack of man-machine games may be side-stepped by analysing existing games to place computer engines as accurately as possible on the FIDE ELO scale of human play. Move-sequences may also be assessed for likelihood if computer-assisted cheating is suspected

    Understanding distributions of chess performances

    Get PDF
    This paper presents evidence for several features of the population of chess players, and the distribution of their performances measured in terms of Elo ratings and by computer analysis of moves. Evidence that ratings have remained stable since the inception of the Elo system in the 1970’s is given in several forms: by showing that the population of strong players fits a simple logistic-curve model without inflation, by plotting players’ average error against the FIDE category of tournaments over time, and by skill parameters from a model that employs computer analysis keeping a nearly constant relation to Elo rating across that time. The distribution of the model’s Intrinsic Performance Ratings can hence be used to compare populations that have limited interaction, such as between players in a national chess federation and FIDE, and ascertain relative drift in their respective rating systems

    Skill Rating by Bayesian Inference

    Get PDF
    Systems Engineering often involves computer modelling the behaviour of proposed systems and their components. Where a component is human, fallibility must be modelled by a stochastic agent. The identification of a model of decision-making over quantifiable options is investigated using the game-domain of Chess. Bayesian methods are used to infer the distribution of players’ skill levels from the moves they play rather than from their competitive results. The approach is used on large sets of games by players across a broad FIDE Elo range, and is in principle applicable to any scenario where high-value decisions are being made under pressure

    Improving ranking quality and fairness in Swiss-system chess tournaments

    Get PDF
    The International Chess Federation (FIDE) imposes a voluminous and complex set of player pairing criteria in Swiss-system chess tournaments and endorses computer programs that are able to calculate the prescribed pairings. The purpose of these formalities is to ensure that players are paired fairly during the tournament and that the final ranking corresponds to the players' true strength order. We contest the official FIDE player pairing routine by presenting alternative pairing rules. These can be enforced by computing maximum weight matchings in a carefully designed graph. We demonstrate by extensive experiments that a tournament format using our mechanism 1) yields fairer pairings in the rounds of the tournament and 2) produces a final ranking that reflects the players' true strengths better than the state-of-the-art FIDE pairing system

    The role of constraints in expert memory

    Get PDF
    A great deal of research has been devoted to developing process models of expert memory. However, K. J. Vicente and J. H. Wang (1998) proposed (a) that process theories do not provide an adequate account of expert recall in domains in which memory recall is a contrived task and (b) that a product theory, the constraint attunement hypothesis (CAH), has received a significant amount of empirical support. We compared 1 process theory (the template theory; TT; F. Gobet & H. A. Simon, 1996c) with the CAH in chess. Chess players (N = 36) differing widely in skill levels were required to recall briefly presented chess positions that were randomized in various ways. Consistent with TT, but inconsistent with the CAH, there was a significant skill effect in a condition in which both the location and distribution of the pieces were randomized. These and other results suggest that process models such as TT can provide a viable account of expert memory in chess

    Data assurance in opaque computations

    Get PDF
    The chess endgame is increasingly being seen through the lens of, and therefore effectively defined by, a data ‘model’ of itself. It is vital that such models are clearly faithful to the reality they purport to represent. This paper examines that issue and systems engineering responses to it, using the chess endgame as the exemplar scenario. A structured survey has been carried out of the intrinsic challenges and complexity of creating endgame data by reviewing the past pattern of errors during work in progress, surfacing in publications and occurring after the data was generated. Specific measures are proposed to counter observed classes of error-risk, including a preliminary survey of techniques for using state-of-the-art verification tools to generate EGTs that are correct by construction. The approach may be applied generically beyond the game domain

    Performance and prediction: Bayesian modelling of fallible choice in chess

    Get PDF
    Evaluating agents in decision-making applications requires assessing their skill and predicting their behaviour. Both are well developed in Poker-like situations, but less so in more complex game and model domains. This paper addresses both tasks by using Bayesian inference in a benchmark space of reference agents. The concepts are explained and demonstrated using the game of chess but the model applies generically to any domain with quantifiable options and fallible choice. Demonstration applications address questions frequently asked by the chess community regarding the stability of the rating scale, the comparison of players of different eras and/or leagues, and controversial incidents possibly involving fraud. The last include alleged under-performance, fabrication of tournament results, and clandestine use of computer advice during competition. Beyond the model world of games, the aim is to improve fallible human performance in complex, high-value tasks

    On the limits of engine analysis for cheating detection in chess

    Get PDF
    The integrity of online games has important economic consequences for both the gaming industry and players of all levels, from professionals to amateurs. Where there is a high likelihood of cheating, there is a loss of trust and players will be reluctant to participate — particularly if this is likely to cost them money. Chess is a game that has been established online for around 25 years and is played over the Internet commercially. In that environment, where players are not physically present “over the board” (OTB), chess is one of the most easily exploitable games by those who wish to cheat, because of the widespread availability of very strong chess-playing programs. Allegations of cheating even in OTB games have increased significantly in recent years, and even led to recent changes in the laws of the game that potentially impinge upon players’ privacy. In this work, we examine some of the difficulties inherent in identifying the covert use of chess-playing programs purely from an analysis of the moves of a game. Our approach is to deeply examine a large collection of games where there is confidence that cheating has not taken place, and analyse those that could be easily misclassified. We conclude that there is a serious risk of finding numerous “false positives” and that, in general, it is unsafe to use just the moves of a single game as prima facie evidence of cheating. We also demonstrate that it is impossible to compute definitive values of the figures currently employed to measure similarity to a chess-engine for a particular game, as values inevitably vary at different depths and, even under identical conditions, when multi-threading evaluation is used
    corecore