3,797 research outputs found

    Modelling team performance in soccer using tactical features derived from position tracking data

    Get PDF
    Decision-makers in soccer routinely assess the tactical behaviour of a team and its opponents both during and after the game to optimize performance. Currently, this assessment is typically driven by notational analysis and observation. Therefore, potential high-impact decisions are often made based on limited or even biased information. With the current study, we aimed to quantitatively assess tactical performance by abstracting a set of spatiotemporal features from the general offensive principles of play in soccer using position tracking data, and to train a machine learning classifier to predict match outcome based on these features computed over the full game as well as only parts of the game. Based on the results of these analyses, we describe a proof of concept of a decision support system for coaches and managers. In an analysis of 302 professional Dutch Eredivisie matches, we were able to train a Linear Discriminant Analysis model to predict match outcome with fair to good (74.1%) accuracy with features computed over the full match, and 67.9% accuracy with features computed over only 1/4th of the match. We therefore conclude that using only position tracking data, we can provide valuable feedback to coaches about how their team is executing the various principles of play, and how these principles are contributing to overall performance

    Predicting match outcome in professional Dutch football using tactical performance metrics computed from position tracking data

    Get PDF
    Quality as well as quantity of tracking data have rapidly increased over the recent years, and multiple leagues have programs for league-wide collection of tracking data. Tracking data enables in-depth performance analysis, especially with regard to tactics. This already resulted in the development of several Key Performance Indicators (KPI’s) related to scoring opportunities, outplaying defenders, numerical balance and territorial advantage. Although some of these KPI’s have gained popularity in the analytics community, little research has been conducted to support the link with performance. Therefore, we aim to study the relationship between match outcome and tactical KPI’s derived from tracking data. Our dataset contains tracking data of all players and the ball, and match outcome, for 118 Dutch premier league matches. Using tracking data, we identified 72.989 passes. For every pass-reception window we computed KPI’s related to numerical superiority, outplayed defenders, territorial gains and scoring opportunities using position data. This individual data was then aggregated over a full match. We then split the dataset in a train and test set, and predicted match outcome using different combinations of features in a logistic regression model. KPI’s related to a combination of off-the-ball features seemed to be the best predictor of match outcome (accuracy of 64.0% and a log loss of 0.67), followed by KPI’s related to the creation of scoring opportunities (accuracy of 58% and a log loss of 0.69). This indicates that although most (commercially) available KPI’s are based on ball-events, the most important information seems to be in off-the-ball activity. We have demonstrated that tactical KPI’s computed from tracking data are relatively good predictors of match outcome. As off-the-ball activity seems to be the main predictor of match outcome, tracking data seems to provide much more insight than notational analysis

    Machine Learning Applications to Predict Road Crash and Soccer Game Outcomes

    Get PDF
    Machine learning has become a cutting-edge and widely studied data science field of study in recent years across many industries and disciplines. In this thesis, two problems (1- crash severity prediction, 2- soccer game outcome prediction.) were investigated by using a set of machine learning approaches, namely: Ridge regression, Lasso Regression, Support Vector Machine (SVM), Neural Network (NN), Random Forest (RF). The first study is focused on investigating the critical factors affecting crash severity on a comprehensive time-series state-wide traffic crash data. The dataset covers crashes occurred in the state of Connecticut between 1995 and 2014. Traffic crashes are an increasing cause of death and injury in the world. The overall purposes of the first study were to propose, develop, and implement machine learning approaches in predicting the severity levels of human beings involved in the crashes and investigating the important crash predictors contributing to the injury severity. The predictor variables included road and vehicle conditions, characteristics of drivers and passengers, and environmental conditions. Results indicate that RF provided the best prediction accuracy of 73.85% in correctly classifying a crash based on its severity: fatal, injury, or property damage only. In addition to the overall comparison of proposed machine learning approaches in terms of accuracy, the prediction results were combined with the economic loss of each severity level to provide managerial insights on estimating the financial consequences of traffic crashes. RF provided the importance of each predictor in affecting the severity levels of involved human beings. The ejection status of the driver or passenger was found to be as the most crucial factor leading to the most severe injuries. Besides, a time series analysis of the 20-years crash data was conducted. The analysis results demonstrated that the prediction accuracy of RF increased with period, and the importance of some predictors also changed. From the perspective of policy making, strict inspection on drunk driving and drug use could lead to substantial road safety improvement. Ejection status is the essential risk factors that affect fatal and incapacitating severity level. The use of seat belts significantly reduces the risk of passengers being ejected out of the vehicle when the crash occurred. In the second study, recent five-season game data of three major leagues were scraped from whoscore.com. The Leagues were two top European leagues, Spanish La Liga, English Premier League (EPL), and one US League, Major League Soccer (MLS). The purpose of the study was to develop a statistically credible machine learning approaches to predict a soccer game outcome and investigate the significance of predictors (game statistics). Different from previous closely-related studies, the proposed machine learning models were not only applied to the combined dataset of the three leagues but also were studied separately on each league to compare the prediction performance and important predictors. The best prediction performance was achieved by NN with an accuracy of 85.71% (+/- 0.73%) of the combined dataset. For each league, RF had the best performance. RF also provided the importance of each predictor. The results presented that the home-field advantage was more evident in the MLS games than in the other two Europe leagues. The home team or away team factor was the most critical predictor that affected the MLS games. Although it was also an important predictor for La Liga and EPL games, the most influential predictor was the difference in the number of shots on target between the home team and away team. For the three leagues, the number of crosses was the most significant pass type, and the difference in the rate of card per foul was the most crucial card situation. The referee primarily determines the difference in the rate of card per foul. For the Europe leagues, the difference in the number of counter attacks and open plays were consequential attempt types affecting a game result in La Liga and EPL, while in the MLS, the difference in the number of set-piece was the most crucial predictor variable. Overall, the results of the two studies indicated that the proposed machine learning approaches yielded effective prediction performance for crash severity and soccer outcomes’ prediction. RF had slightly superior prediction performance among the five machine learning models for both studies. Even though the two problem domains were from different industries or policy making area, the proposed machine learning approaches effectively dealt with the complexity of the data in terms of dimensionality and time-series nature

    Sports Analytics Algorithms for Performance Prediction

    Get PDF

    Proceedings of Mathsport international 2017 conference

    Get PDF
    Proceedings of MathSport International 2017 Conference, held in the Botanical Garden of the University of Padua, June 26-28, 2017. MathSport International organizes biennial conferences dedicated to all topics where mathematics and sport meet. Topics include: performance measures, optimization of sports performance, statistics and probability models, mathematical and physical models in sports, competitive strategies, statistics and probability match outcome models, optimal tournament design and scheduling, decision support systems, analysis of rules and adjudication, econometrics in sport, analysis of sporting technologies, financial valuation in sport, e-sports (gaming), betting and sports

    Using Deep Convolutional Neural Networks to Predict Goal-Scoring Opportunities in Soccer

    Get PDF
    Deep learning approaches have successfully been applied to several image recognition tasks, such as face, object, animal and plant classification. However, almost no research has examined on how to use the field of machine learning to predict goal-scoring opportunities in soccer from position data. In this paper, we propose the use of deep convolutional neural networks (DCNNs) for the above stated problem. This aim is actualized using the following steps: 1) development of novel algorithms for finding goal-scoring opportunities and ball possession which are used to obtain positive and negative examples. The dataset consists of position data from 29 matches played by a German Bundlesliga team. 2) These examples are used to create original and enhanced images (which contain object trails of soccer positions) with a resolution size of 256×256256 \times 256 pixels. 3) Both the original and enhanced images are fed independently as input to two DCNN methods: instances of both GoogLeNet and a 3-layered CNN architecture. A K-nearest neighbor classifier was trained and evaluated on ball positions as a baseline experiment. The results show that the GoogLeNet architecture outperforms all other methods with an accuracy of 67.1%
    • …
    corecore