48 research outputs found
Recommended from our members
I’ve (Urn)ed This: An Application and Criterion-based Evaluation of the Urnings Algorithm
There is increased interest in personalized learning and making e-learning environments more adaptable. Some e-learning systems may use an Item Response Theory (IRT)-based assessment system. An important distinction between assessment and learning contexts is that learner proficiency is expected to remain constant across an assessment, while it is expected to change over time in a learning context. Constant learner proficiency during an assessment enables conventional approaches to estimating person and item parameters using IRT. These IRT-based systems could be abandoned for alternative approaches to modeling learners and system learning content, but assessments may provide more functions than adapting learning material to students. Thus, there is the question, how can e-learning systems with IRT-based assessment components more dynamically adapt their learning content? Is there a solution that leverages IRT for adapting the learning content of the system? A promising solution is the Urnings algorithm. Like other candidate algorithms, it is computationally light, but this algorithm has mechanisms for preventing variance inflation and is suitable for e-learning contexts. It also provides a measure of uncertainty around estimates. It has been studied both through simulations and applications to e-learning systems. Results are promising; however, there has not been an application of the Urnings algorithm to an e-learning context where there are conventionally estimated person parameters to compare the algorithm estimates to. This study addresses this gap by applying the Urnings algorithm to a K–8 reading and mathematics learning platform. In data from this platform, we have person parameter estimates across academic years from an in-system diagnostic assessment. Results from this study will help industry researchers understand the feasibility of the Urnings algorithm for large e-learning systems with IRT-based assessment components
Predicting Dominance Rankings for Score-Based Games
Game competitions may involve different player roles and be score-based rather than win/loss based. This raises the issue of how best to draw opponents for matches in ongoing competitions, and how best to rank the players in each role. An example is the Ms Pac-Man versus Ghosts Competition which requires competitors to develop software controllers to take charge of the game's protagonists: participants may develop software controllers for either or both Ms Pac-Man and the team of four ghosts. In this paper, we compare two ranking schemes for win-loss games, Bayes Elo and Glicko. We convert the game into one of win-loss ("dominance") by matching controllers of identical type against the same opponent in a series of pair-wise comparisons. This implicitly creates a "solution concept" as to what a constitutes a good player. We analyze how many games are needed under two popular ranking algorithms, Glicko and Bayes Elo, before one can infer the strength of the players, according to our proposed solution concept, without performing an exhaustive evaluation. We show that Glicko should be the method of choice for online score-based game competitions
Team formation using recommendation systems
The importance of team formation has been realized since ages, but finding the most effective team out of the available human resources is a problem that persists to the date. Having members with complementary skills, along with a few must-have behavioral traits, such as trust and collaborativeness among the team members are the key ingredients behind team synergy and performance. This thesis designs and implements two different algorithms for the team formation problem using ideas adapted from the recommender systems literature. One of the proposed solutions uses the Glicko-2 rating system to rate the employees’ skills which can easily separate the skill ability and experience of the employees. The final contribution of this thesis is to build a system with ”plug-in” capability, meaning any new recommendation algorithm could be easily plugged in inside the system. Our extensive experimental analyses explore nuances of data sources, data storage methodologies, as well as characteristics of different recommendation algorithms with rating and ranking sub-systems
Tailoring a psychophysiologically driven rating system
Humans have always been interested in ways to measure and compare their performances to establish who is best at a particular activity. The first Olympic Games, for instance, were carried out in 776 BC, and it was a defining moment in history where ranking based competitive activities managed to reach the general populous. Every competition must face the issue of how to evaluate and rank competitors, and often rules are required to account for many different aspects such as variations in conditions, the ability to cheat, and, of course, the value of entertainment. Nowadays, measurements are performed out through various rating systems, which considers the outcomes of the activity to rate the participants. However, they do not seem to address the psychological aspects of an individual in a competition.
This dissertation employs several psychophysiological assessment instruments intending to facilitate the acquisition of skill level rating in competitive gaming. To do so, an exergame that uses non-conventional inputs, such as body tracking to prevent input biases, was developed. The sample size of this study is ten, and the participants were put on a round-robin tournament to provide equal intervals between games for each player.
After analyzing the outcome of the competition, it revealed some critical insights on the psychophysiological instruments; Especially the significance of Flow in terms of the prolificacy of a player. Although the findings did not provide an alternative for the traditional rating systems, it shows the importance of considering other aspects of the competition, such as psychophysiological metrics to fine-tune the rating. These potentially reveal more in-depth insight into the competition in comparison to just the binary outcome
A State-Space Perspective on Modelling and Inference for Online Skill Rating
This paper offers a comprehensive review of the main methodologies used for
skill rating in competitive sports. We advocate for a state-space model
perspective, wherein players' skills are represented as time-varying, and match
results serve as the sole observed quantities. The state-space model
perspective facilitates the decoupling of modeling and inference, enabling a
more focused approach highlighting model assumptions, while also fostering the
development of general-purpose inference tools. We explore the essential steps
involved in constructing a state-space model for skill rating before turning to
a discussion on the three stages of inference: filtering, smoothing and
parameter estimation. Throughout, we examine the computational challenges of
scaling up to high-dimensional scenarios involving numerous players and
matches, highlighting approximations and reductions used to address these
challenges effectively. We provide concise summaries of popular methods
documented in the literature, along with their inferential paradigms and
introduce new approaches to skill rating inference based on sequential Monte
Carlo and finite state-spaces. We close with numerical experiments
demonstrating a practical workflow on real data across different sports
Recommended from our members
A Social Network Approach Reveals Associations between Mouse Social Dominance and Brain Gene Expression
Modelling complex social behavior in the laboratory is challenging and requires analyses of dyadic interactions occurring over time in a physically and socially complex environment. In the current study, we approached the analyses of complex social interactions in group-housed male CD1 mice living in a large vivarium. Intensive observations of social interactions during a 3-week period indicated that male mice form a highly linear and steep dominance hierarchy that is maintained by fighting and chasing behaviors. Individual animals were classified as dominant, sub-dominant or subordinate according to their David’s Scores and I& SI ranking. Using a novel dynamic temporal Glicko rating method, we ascertained that the dominance hierarchy was stable across time. Using social network analyses, we characterized the behavior of individuals within 66 unique relationships in the social group. We identified two individual network metrics, Kleinberg’s Hub Centrality and Bonacich’s Power Centrality, as accurate predictors of individual dominance and power. Comparing across behaviors, we establish that agonistic, grooming and sniffing social networks possess their own distinctive characteristics in terms of density, average path length, reciprocity out-degree centralization and out-closeness centralization. Though grooming ties between individuals were largely independent of other social networks, sniffing relationships were highly predictive of the directionality of agonistic relationships. Individual variation in dominance status was associated with brain gene expression, with more dominant individuals having higher levels of corticotropin releasing factor mRNA in the medial and central nuclei of the amygdala and the medial preoptic area of the hypothalamus, as well as higher levels of hippocampal glucocorticoid receptor and brain-derived neurotrophic factor mRNA. This study demonstrates the potential and significance of combining complex social housing and intensive behavioral characterization of group-living animals with the utilization of novel statistical methods to further our understanding of the neurobiological basis of social behavior at the individual, relationship and group levels
Comparing Elo, Glicko, IRT, and Bayesian IRT Statistical Models for Educational and Gaming Data
Statistical models used for estimating skill or ability levels often vary by field, however their underlying mathematical models can be very similar. Differences in the underlying models can be due to the need to accommodate data with different underlying formats and structure. As the models from varying fields increase in complexity, their ability to be applied to different types of data may have the ability to increase. Models that are applied to educational or psychological data have advanced to accommodate a wide range of data formats, including increased estimation accuracy with sparsely populated data matrices. Conversely, the field of online gaming has expanded over the last two decades to include the use of more complex statistical models to provide real-time game matching based on ability estimates. It can be useful to see how statistical models from educational and gaming fields compare as different datasets may benefit from different ability estimation procedures. This study compared statistical models typically used in game match making systems (Elo, Glicko) to models used in psychometric modeling (item response theory and Bayesian item response theory) using both simulated data and real data under a variety of conditions. Results indicated that conditions with small numbers of items or matches had the most accurate skill estimates using the Bayesian IRT (item response theory) one-parameter logistic (1PL) model, regardless of whether educational or gaming data were used. This held true for all sample sizes with small numbers of items. However, the Elo and the non-Bayesian IRT 1PL models were close to the Bayesian IRT 1PL model’s estimations for both gaming and educational data. While the 2PL models were not shown to be accurate for the gaming study conditions, the IRT 2PL and Bayesian IRT 2PL models outperformed the 1PL models when 2PL educational data were generated with the larger sample size and item condition. Overall, the Bayesian IRT 1PL model seemed to be the best choice across the smaller sample and match size conditions
A Social Network Approach Reveals Associations between Mouse Social Dominance and Brain Gene Expression
Modelling complex social behavior in the laboratory is challenging and requires analyses of dyadic interactions occurring over time in a physically and socially complex environment. In the current study, we approached the analyses of complex social interactions in group-housed male CD1 mice living in a large vivarium. Intensive observations of social interactions during a 3-week period indicated that male mice form a highly linear and steep dominance hierarchy that is maintained by fighting and chasing behaviors. Individual animals were classified as dominant, sub-dominant or subordinate according to their David’s Scores and I& SI ranking. Using a novel dynamic temporal Glicko rating method, we ascertained that the dominance hierarchy was stable across time. Using social network analyses, we characterized the behavior of individuals within 66 unique relationships in the social group. We identified two individual network metrics, Kleinberg’s Hub Centrality and Bonacich’s Power Centrality, as accurate predictors of individual dominance and power. Comparing across behaviors, we establish that agonistic, grooming and sniffing social networks possess their own distinctive characteristics in terms of density, average path length, reciprocity out-degree centralization and out-closeness centralization. Though grooming ties between individuals were largely independent of other social networks, sniffing relationships were highly predictive of the directionality of agonistic relationships. Individual variation in dominance status was associated with brain gene expression, with more dominant individuals having higher levels of corticotropin releasing factor mRNA in the medial and central nuclei of the amygdala and the medial preoptic area of the hypothalamus, as well as higher levels of hippocampal glucocorticoid receptor and brain-derived neurotrophic factor mRNA. This study demonstrates the potential and significance of combining complex social housing and intensive behavioral characterization of group-living animals with the utilization of novel statistical methods to further our understanding of the neurobiological basis of social behavior at the individual, relationship and group levels
Statistical methods for detecting match-fixing in tennis
Match-fixing is a key problem facing many sports, undermining the integrity and sporting spectacle of events, ruining players’ careers and enabling the criminals behind the fixes to funnel funds into other illicit activities. Although for a long time authorities were reticent to act, more and more sports bodies and betting companies are now taking steps to tackle the issue, though much remains to be done. Tennis in particular has faced past criticism for its approach to combatting match-fixing, culminating in widespread media coverage of a leak of match-fixing related documents in 2016, although the Tennis Integrity Unit has since intensified its efforts to deal with the problem. In this thesis, we develop new statistical methods for identifying tennis matches in which suspicious betting activity occurs. We also make some advancements on existing sports models to enable us to better analyse tennis matches to detect this corrupt activity. Our work is among the first to use both pre-match and in-play odds data to investigate match-fixing, and to also integrate betting volumes. Our pre-match odds are sampled at several intervals during the pre-match market, allowing for more detailed analysis than other work. Our in-play odds data are recorded during every game break along with live scores so that we can explore how the odds vary as the score progresses. In particular, we look for divergences between market odds and predictions coming both from sports models and from direct predictions of odds based on in-play events. Our methods successfully identify past matches that other external sources have found to contain suspicious betting activity, and are able to quantify how unusual this activity was in relation to typical betting behaviour. This suggests that our methods, coupled with other sources of evidence, can provide a valuable quantification of suspicious betting activity in future matches
Recommended from our members
Point Process Models for Heterogeneous Event Time Data
Interaction event times observed on a social network provide valuable information for social scientists to gain insight into complex social dynamics that are challenging to understand. However, it can be difficult to accurately represent the heterogeneity in the data and to model the dependence structure in the network system. This requires flexible models that can capture the complicated dynamics and complex patterns. Point process models offer an elegant framework for modeling event time data. This dissertation concentrates on developing point process models and related diagnostic tools, with a real data application involving an animal behavior network.
In this dissertation, we first propose a Markov-modulated Hawkes process (MMHP) model to capture the sporadic and bursty patterns often observed in event time data. A Bayesian inference procedure is developed to evaluate the likelihood by using a variational approximation and the forward-backward algorithm. The validity of the proposed model and associated estimation algorithms is demonstrated using synthetic data and the animal behavior data. Facilitated by the power of the MMHP model, we construct network point process models that can capture a social hierarchy structure by embedding nodes in a latent space that can represent the underlying social ranks. Our model provides a ranking method for social hierarchy studies and describes the dynamics of social hierarchy formation from a novel perspective – taking advantage of the detailed information available in event time data. We show that the network point process models appropriately captures the temporal dynamics and heterogeneity in the network event time data, by providing meaningful inferred rankings and by calibrating the accuracy of predictions with relevant measures of uncertainty. In addition to developing a sensible and flexible model for network event time data, the last part of this dissertation provides essential tools for diagnosing lack of fit issues for such models. We develop a systematic set of diagnostic tools and visualizations for point process models fitted to data in the dynamic network setting. By inspecting the structure of the residual process and Pearson residual on the network, we can validate whether a model adequately captures the temporal and network dependence structures in the observed data