4,254 research outputs found

    Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees

    Full text link
    Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. The original training set of matches and features, which was provided for the competition, was augmented with additional matches that were played between 4 April and 13 April 2023, representing the period after which the training set ended, but prior to the first matches that were to be predicted (upon which the performance was evaluated). A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. Notably, deep learning models have frequently been disregarded in this particular task. Therefore, in this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model. The model was trained using the most recent five years of data, and three training and validation sets were used in a hyperparameter grid search. The results from the validation sets show that our model had strong performance and stability compared to previously published models from the 2017 Soccer Prediction Challenge for win/draw/loss prediction

    Data-driven action-value functions for evaluating players in professional team sports

    Get PDF
    As more and larger event stream datasets for professional sports become available, there is growing interest in modeling the complex play dynamics to evaluate player performance. Among these models, a common player evaluation method is assigning values to player actions. Traditional action-values metrics, however, consider very limited game context and player information. Furthermore, they provide directly related to goals (e.g., shots), not all actions. Recent work has shown that reinforcement learning provided powerful methods for addressing quantifying the value of player actions in sports. This dissertation develops deep reinforcement learning (DRL) methods for estimating action values in sports. We make several contributions to DRL for sports. First, we develop neural network architectures that learn an action-value Q-function from sports events logs to estimate each team\u27s expected success given the current match context. Specifically, our architecture models the game history with a recurrent network and predicts the probability that a team scores the next goal. From the learned Q-values, we derive a Goal Impact Metric (GIM) for evaluating a player\u27s performance over a game season. We show that the resulting player rankings are consistent with standard player metrics and temporally consistent within and across seasons. Second, we address the interpretability of the learned Q-values. While neural networks provided accurate estimates, the black-box structure prohibits understanding the influence of different game features on the action values. To interpret the Q-function and understand the influence of game features on action values, we design an interpretable mimic learning framework for the DRL. The framework is based on a Linear Model U-Tree (LMUT) as a transparent mimic model, which facilitates extracting the function rules and computing the feature importance for action values. Third, we incorporate information about specific players into the action values, by introducing a deep player representation framework. In this framework, each player is assigned a latent feature vector called an embedding, with the property that statistically similar players are mapped to nearby embeddings. To compute embeddings that summarize the statistical information about players, we implement a Variational Recurrent Ladder Agent Encoder (VaRLAE) to learn a contextualized representation for when and how players are likely to act. We learn and evaluate deep Q-functions from event data for both ice hockey and soccer. These are challenging continuous-flow games where game context and medium-term consequences are crucial for properly assessing the impact of a player\u27s actions

    TRACKING FORMATION CHANGES AND ITS EFFECTS ON SOCCER USING POSITION DATA

    Get PDF
    This study investigated the application of advanced machine learning methods, specifically k-means clustering, k-Nearest Neighbors (kNN), and Support Vector Machines (SVM), to analyze player tracking data in soccer. The primary hypothesis posits that such data can yield a standalone, in-depth understanding of soccer matches. The study revealed that while k-means and spatial analysis are promising in analyzing player positions, kNN and SVM show limitations without additional variables. Spatial analysis examined each team’s convex hull and studied the correlation between team length, width, and surface area. Results showed team length and surface area have a strong positive correlation with a value of 0.8954. This suggested that teams with longer team length have a more direct style of play with players more spread out which led to larger surface areas. k-means clustering was performed with different k values derived from different approaches. The silhouette method recommended a k value of 2 and the elbow recommended a k value of 4. The context of the sport suggested additional analysis with a k value of 11. The results from k-means suggested natural data partitions, highlighting distinct player roles and field positions. kNN was performed to find similar players with the model of k = 19 showing the highest accuracy of 8.61%. The SVM model returned a classification of 55 classes which indicated a highly granular level of categorization for player roles. The results from kNN and SVM indicated the necessity of further contextual data for more effective analysis and emphasized the need for balanced datasets and careful model evaluation to avoid biases and ensure practical application in real-world scenarios. In conclusion, each algorithm offers unique perspectives and interpretations on player positioning and team formations. These algorithms, when combined with expert knowledge and additional contextual data, can significantly enrich the scope of analysis in soccer. Future work should consider incorporating event data and additional variables to enhance the depth of analytical insights, enabling a more comprehensive understanding of how formations evolve in response to various in-game situations

    Temporal consistency in learning action values for volleyball

    Get PDF
    Learning actions values is a key idea in sports analytics with applications such as player ranking, tactical insight and outcome prediction. We compare two fundamentally different approaches for learning action values on a novel play-by-play volleyball dataset. In the first approach, we employ regression models that implicitly assume statistical independence of data samples. In the second approach, we use a deep reinforcement learning model, explicitly enforcing the sequential nature of the data in the learning process. We find that temporally independent regression can in certain settings outperform the reinforcement learning approach in terms of predictive accuracy, but the latter performs much better when temporal consistency is required. We also consider a mimic regression tree as a way to add interpretability to the deep reinforcement learning approach. Finally, we examine the computed action values and perform a number of example analyses to verify their validity

    Proceedings of Mathsport international 2017 conference

    Get PDF
    Proceedings of MathSport International 2017 Conference, held in the Botanical Garden of the University of Padua, June 26-28, 2017. MathSport International organizes biennial conferences dedicated to all topics where mathematics and sport meet. Topics include: performance measures, optimization of sports performance, statistics and probability models, mathematical and physical models in sports, competitive strategies, statistics and probability match outcome models, optimal tournament design and scheduling, decision support systems, analysis of rules and adjudication, econometrics in sport, analysis of sporting technologies, financial valuation in sport, e-sports (gaming), betting and sports
    • …
    corecore