389 research outputs found

    Machine Learning and Multivariate Statistical Tools for Football Analytics

    Full text link
    [ES] Esta tesis doctoral se centra en el estudio, implementación y aplicación de técnicas de aprendizaje automático y estadística multivariante en el emergente campo de la analítica deportiva, concretamente en el fútbol. Se aplican procedimientos comunmente utilizados y métodos nuevos para resolver cuestiones de investigación en diferentes áreas del análisis del fútbol, tanto en el ámbito del rendimiento deportivo como en el económico. Las metodologías empleadas en esta tesis enriquecen las técnicas utilizadas hasta el momento para obtener una visión global del comportamiento de los equipos de fútbol y pretenden ayudar al proceso de toma de decisiones. Además, la metodología se ha implementado utilizando el software estadístico libre R y datos abiertos, lo que permite la replicabilidad de los resultados. Esta tesis doctoral pretende contribuir a la comprensión de los modelos de aprendizaje automático y estadística multivariante para la predicción analítica deportiva, comparando su capacidad predictiva y estudiando las variables que más influyen en los resultados predictivos de estos modelos. Así, siendo el fútbol un juego de azar donde la suerte juega un papel importante, se proponen metodologías que ayuden a estudiar, comprender y modelizar la parte objetiva de este deporte. Esta tesis se estructura en cinco bloques, diferenciando cada uno en función de la base de datos utilizada para alcanzar los objetivos propuestos. El primer bloque describe las áreas de estudio más comunes en la analítica del fútbol y las clasifica en función de los datos utilizados. Esta parte contiene un estudio exhaustivo del estado del arte de la analítica del fútbol. Así, se recopila parte de la literatura existente en función de los objetivos alcanzados, conjuntamente con una revisión de los métodos estadísticos aplicados. Estos modelos son los pilares sobre los que se sustentan los nuevos procedimientos aquí propuestos. El segundo bloque consta de dos capítulos que estudian el comportamiento de los equipos que alcanzan la Liga de Campeones o la Europa League, descienden a segunda división o permanecen en mitad de la tabla. Se proponen varias técnicas de aprendizaje automático y estadística multivariante para predecir la posición de los equipos a final de temporada. Una vez realizada la predicción, se selecciona el modelo con mejor precisión predictiva para estudiar las acciones de juego que más discriminan entre posiciones. Además, se analizan las ventajas de las técnicas propuestas frente a los métodos clásicos utilizados hasta el momento. El tercer bloque consta de un único capítulo en el que se desarrolla un código de web scraping para facilitar la recuperación de una nueva base de datos con información cuantitativa de las acciones de juego realizadas a lo largo del tiempo en los partidos de fútbol. Este bloque se centra en la predicción de los resultados de los partidos (victoria, empate o derrota) y propone la combinación de una técnica de aprendizaje automático, random forest, y la regresión Skellam, un método clásico utilizado habitualmente para predecir la diferencia de goles en el fútbol. Por último, se compara la precisión predictiva de los métodos clásicos utilizados hasta ahora con los métodos multivariantes propuestos. El cuarto bloque también comprende un único capítulo y pertenece al área económica del fútbol. En este capítulo se aplica un novedoso procedimiento para desarrollar indicadores que ayuden a predecir los precios de traspaso. En concreto, se muestra la importancia de la popularidad a la hora de calcular el valor de mercado de los jugadores, por lo que este capítulo propone una nueva metodología para la recogida de información sobre la popularidad de los jugadores. En el quinto bloque se revelan los aspectos más relevantes de esta tesis para la investigación y la analítica en el fútbol, incluyendo futuras líneas de trabajo.[CA] Aquesta tesi doctoral se centra en l'estudi, implementació i aplicació de tècniques d'aprenentatge automàtic i estadística multivariant en l'emergent camp de l'analítica esportiva, concretament en el futbol. S'apliquen procediments comunament utilitzats i mètodes nous per a resoldre qu¿estions d'investigació en diferents àrees de l'anàlisi del futbol, tant en l'àmbit del rendiment esportiu com en l'econòmic. Les metodologies emprades en aquesta tesi enriqueixen les tècniques utilitzades fins al moment per a obtindre una visió global del comportament dels equips de futbol i pretenen ajudar al procés de presa de decisions. A més, la metodologia s'ha implementat utilitzant el programari estadístic lliure R i dades obertes, la qual cosa permet la replicabilitat dels resultats. Aquesta tesi doctoral pretén contribuir a la comprensió dels models d'aprenentatge automàtic i estadística multivariant per a la predicció analítica esportiva, comparant la seua capacitat predictiva i estudiant les variables que més influeixen en els resultats predictius d'aquests models. Així, sent el futbol un joc d'atzar on la sort juga un paper important, es proposen metodologies que ajuden a estudiar, comprendre i modelitzar la part objectiva d'aquest esport. Aquesta tesi s'estructura en cinc blocs, diferenciant cadascun en funció de la base de dades utilitzada per a aconseguir els objectius proposats. El primer bloc descriu les àrees d'estudi més comuns en l'analítica del futbol i les classifica en funció de les dades utilitzades. Aquesta part conté un estudi exhaustiu de l'estat de l'art de l'analítica del futbol. Així, es recopila part de la literatura existent en funció dels objectius aconseguits, conjuntament amb una revisió dels mètodes estadístics aplicats. Aquests models són els pilars sobre els quals se sustenten els nous procediments ací proposats. El segon bloc consta de dos capítols que estudien el comportament dels equips que aconsegueixen la Lliga de Campions o l'Europa League, descendeixen a segona divisió o romanen a la meitat de la taula. Es proposen diverses tècniques d'aprenentatge automàtic i estadística multivariant per a predir la posició dels equips a final de temporada. Una vegada realitzada la predicció, se selecciona el model amb millor precisió predictiva per a estudiar les accions de joc que més discriminen entre posicions. A més, s'analitzen els avantatges de les tècniques proposades enfront dels mètodes clàssics utilitzats fins al moment. El tercer bloc consta d'un únic capítol en el qual es desenvolupa un codi de web scraping per a facilitar la recuperació d'una nova base de dades amb informació quantitativa de les accions de joc realitzades al llarg del temps en els partits de futbol. Aquest bloc se centra en la predicció dels resultats dels partits (victòria, empat o derrota) i proposa la combinació d'una tècnica d'aprenentatge automàtic, random forest, i la regressió Skellam, un mètode clàssic utilitzat habitualment per a predir la diferència de gols en el futbol. Finalment, es compara la precisió predictiva dels mètodes clàssics utilitzats fins ara amb els mètodes multivariants proposats. El quart bloc també comprén un únic capítol i pertany a l'àrea econòmica del futbol. En aquest capítol s'aplica un nou procediment per a desenvolupar indicadors que ajuden a predir els preus de traspàs. En concret, es mostra la importància de la popularitat a l'hora de calcular el valor de mercat dels jugadors, per la qual cosa aquest capítol proposa una nova metodologia per a la recollida d'informació sobre la popularitat dels jugadors. En el cinqué bloc es revelen els aspectes més rellevants d'aquesta tesi per a la investigació i l'analítica en el futbol, incloent-hi futures línies de treball.[EN] This doctoral thesis focuses on studying, implementing, and applying machine learning and multivariate statistics techniques in the emerging field of sports analytics, specifically in football. Commonly used procedures and new methods are applied to solve research questions in different areas of football analytics, both in the field of sports performance and in the economic field. The methodologies used in this thesis enrich the techniques used so far to obtain a global vision of the behaviour of football teams and are intended to help the decision-making process. In addition, the methodology was implemented using the free statistical software R and open data, which allows for reproducibility of the results. This doctoral thesis aims to contribute to the understanding of the behaviour of machine learning and multivariate models for analytical sports prediction, comparing their predictive capacity and studying the variables that most influence the predictive results of these models. Thus, since football is a game of chance where luck plays an important role, this document proposes methodologies that help to study, understand, and model the objective part of this sport. This thesis is structured into five blocks, differentiating each according to the database used to achieve the proposed objectives. The first block describes the most common study areas in football analytics and classifies them according to the available data. This part contains an exhaustive study of football analytics state of the art. Thus, part of the existing literature is compiled based on the objectives achieved, with a review of the statistical methods applied. These methods are the pillars on which the new procedures proposed here are based. The second block consists of two chapters that study the behaviour of teams concerning the ranking at the end of the season: top (qualifying for the Champions League or Europa League), middle, or bottom (relegating to a lower division). Several machine learning and multivariate statistical techniques are proposed to predict the teams' position at the season's end. Once the prediction has been made, the model with the best predictive accuracy is selected to study the game actions that most discriminate between positions. In addition, the advantages of our proposed techniques compared to the classical methods used so far are analysed. The third block consists of a single chapter in which a web scraping code is developed to facilitate the retrieval of a new database with quantitative information on the game actions carried out over time in football matches. This block focuses on predicting match outcomes (win, draw, or loss) and proposing the combination of a machine learning technique, random forest, and Skellam regression model, a classical method commonly used to predict goal difference in football. Finally, the predictive accuracy of the classical methods used so far is compared with the proposed multivariate methods. The fourth block also comprises a single chapter and pertains to the economic football area. This chapter applies a novel procedure to develop indicators that help predict transfer fees. Specifically, it is shown the importance of popularity when calculating the players' market value, so this chapter is devoted to propose a new methodology for collecting players' popularity information. The fifth block reveals the most relevant aspects of this thesis for research and football analytics, including future lines of work.Malagón Selma, MDP. (2023). Machine Learning and Multivariate Statistical Tools for Football Analytics [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/19763

    Winning With Chaos in Association Football:Spatiotemporal Event Distribution Randomness Metric for Team Performance Evaluation

    Get PDF
    Association football (commonly known as football or soccer) in the modern era places a greater emphasis on collaborating and working together as a team instead of relying solely on individual skills to strategize winning performances. The low-scoring and unpredictable nature of association football makes evaluating team performances challenging. Space creation and space utilization have been discussed in the football world lately. Existing literature evaluates this with on and off-ball runs by players for deceiving defenders to create open spaces. However, the contribution of these team ball movements’ enhanced randomness or chaotic nature to winning performances has yet to be explored. This work proposes a novel entropy-based time-series performance evaluation metric, EDRan, for quantifying this enhanced random nature by analyzing the spatial distribution of game events at regular intervals. Additionally, an unexplored cumulative ball possession matrix is used to quantify randomness. The correlation between the match winner and spatial event distribution randomness at regular intervals was analyzed. The significance of the proposed metric was demonstrated using a generalized linear model (GLM), which achieved an average accuracy of 80% for match-winning performance classification. The GLM p-values and coefficients revealed statistically significant relationships between the extracted temporal features and match-winning performances. Findings further revealed dispersed, highly random event distribution by winning teams during the early phases of the game, implying attacking behavior, followed by a compact, cautious playing style toward the end, suggesting that the game’s first-half performances are more pivotal. Despite the unpredictability of actual scores in association football, the proposed approach effectively captured the differences in performances between stronger and weaker teams with temporal relationships, highlighting its significance as a time-series metric for performance evaluation

    A New Web Search Engine with Learning Hierarchy

    Get PDF
    Most of the existing web search engines (such as Google and Bing) are in the form of keyword-based search. Typically, after the user issues a query with the keywords, the search engine will return a flat list of results. When the query issued by the user is related to a topic, only the keyword matching may not accurately retrieve the whole set of webpages in that topic. On the other hand, there exists another type of search system, particularly in e-Commerce web- sites, where the user can search in the categories of different faceted hierarchies (e.g., product types and price ranges). Is it possible to integrate the two types of search systems and build a web search engine with a topic hierarchy? The main diffculty is how to classify the vast number of webpages on the Internet into the topic hierarchy. In this thesis, we will leverage machine learning techniques to automatically classify webpages into the categories in our hierarchy, and then utilize the classification results to build the new search engine SEE. The experimental results demonstrate that SEE can achieve better search results than the traditional keyword-based search engine in most of the queries, particularly when the query is related to a topic. We also conduct a small-scale usability study which further verifies that SEE is a promising search engine. To further improve SEE, we also propose a new active learning framework with several novel strategies for hierarchical classification

    Correlation Between VO2 Max, Speed, and Limb Muscle Explosive Power with Agility in Soccer Players

    Get PDF
    Background: Agility is one of football's most critical anaerobic capacities. One component is the ability to change direction. Players change direction (Change of direction speed / CODS) every 2-4 seconds. Agility is influenced by several things, including speed and limb muscle explosive power, and is related to aerobic capacity, namely VO2 Max (maximum oxygen consumption) in the Running economy aspect. It is necessary to research the relationship between VO2 Max, speed, and explosive power of limb muscles with agility in soccer playersObjective: Knowing correlation between VO2 Max, speed, and limb muscle explosive power with agility in soccer players.Methods: Twenty-seven male players (Diponegoro Muda PS Undip) were involved in this study (mean±SD; age 13.22 ± 0.18 years, weight 46.78 ±1.67 kg, height 158± 1.88 cm). This research is a correlational study using a cross-sectional design. Each player will be measured VO2 Max with the Multistage Fitness Test, speed with the 20-meter sprint test, the explosive power of the limb muscles with the surgent jump test, and agility (CODS) using the Illinois agility test. The hypothesis test used is Spearman's hypothesis test and linear regression test.Results: Spearman correlation test found a strong relationship between VO2 Max and agility (r=-0.743; p= <0.001), there was a moderate relationship between speed and agility (r=0.556; p== 0.003), there was a strong relationship between muscle explosive power limbs with agility (r=-0.766; p= <0.001), and the results of linear regression of variables VO2 Max, speed, limb muscle explosive power on agility showed a strong relationship (r=0.806; r2= 0.650; p=<0.001).Conclusion: Conclusions: VO2 Max correlates with agility; speed correlates with agility; limb muscle explosive power correlates with agility; and the variables of VO2 Max, speed, and limb muscle explosive power have a relationship with agility

    External load profile during different sport-specific activities in semi-professional soccer players

    Get PDF
    BackgroundGlobal Positioning System (GPS) devices are widely used in soccer for monitoring external load (EL) indicators with the aim of maximizing sports performance.The aim of this study was to investigate the EL indicators differences in players of different playing positions (i.e., central backs, external strikers, fullbacks, midfielders, strikers, wide midfielder) between and within different sport-specific tasks and official matches.Methods1932 observations from 28 semi-professional soccer players (age: 25 +/- 6 years, height: 183 +/- 6 cm, weight: 75.2 +/- 7 kg) were collected through GPS devices (Qstarz BT-Q1000EX, 10 Hz) during the season 2019-2020. Participants were monitored during Official Match (OM), Friendly Matches (FM), Small Sided Games (SSG), and Match-Based Exercises (MBE). Metabolic (i.e., metabolic power, percentage of metabolic power &gt; 35w, number of intense actions per minute, distance per minute, passive recovery time per minute) and neuromuscular indicators (i.e., percentage of intense accelerations, percentage of intense decelerations, change of direction per min &gt; 30 degrees) were recorded during each task.ResultsStatistically significant differences were detected in EL indicators between playing positions within each task and between tasks. In particular, results from the two-way ANOVA tests showed significant interaction, but with small effect size, in all the EL indicators between playing positions for each task and within tasks. Moreover, statistical differences, but with small effect size, between playing positions were detected in each task and for each EL indicator. Finally, the strongest statistical differences (with large effect size) were detected between tasks for each EL indicator. Details of the Tukey post-hoc analysis reporting the pairwise comparisons within and between tasks with playing positions are also provided.ConclusionIn semi-professional soccer players, different metabolic and neuromuscular performance were detected in different playing position between and within different tasks and official matches. Coaches should consider the different physical responses related to different physical tasks and playing position to design the most appropriate training program

    Graph-Based Multi-Camera Soccer Player Tracker

    Full text link
    The paper presents a multi-camera tracking method intended for tracking soccer players in long shot video recordings from multiple calibrated cameras installed around the playing field. The large distance to the camera makes it difficult to visually distinguish individual players, which adversely affects the performance of traditional solutions relying on the appearance of tracked objects. Our method focuses on individual player dynamics and interactions between neighborhood players to improve tracking performance. To overcome the difficulty of reliably merging detections from multiple cameras in the presence of calibration errors, we propose the novel tracking approach, where the tracker operates directly on raw detection heat maps from multiple cameras. Our model is trained on a large synthetic dataset generated using Google Research Football Environment and fine-tuned using real-world data to reduce costs involved with ground truth preparation

    DribbleBot: Dynamic Legged Manipulation in the Wild

    Full text link
    DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i.e., in-the-wild). We adopt the paradigm of training policies in simulation using reinforcement learning and transferring them into the real world. We overcome critical challenges of accounting for variable ball motion dynamics on different terrains and perceiving the ball using body-mounted cameras under the constraints of onboard computing. Our results provide evidence that current quadruped platforms are well-suited for studying dynamic whole-body control problems involving simultaneous locomotion and manipulation directly from sensory observations.Comment: To appear at the IEEE Conference on Robotics and Automation (ICRA), 2023. Video is available at https://gmargo11.github.io/dribblebot

    Analysis of team success based on technical match-play performance in the Australian Football League Women’s (AFLW) competition

    Get PDF
    An understanding of the effect contextual data may have on key match-play technical performance indicators in the Australian Football League Women’s (AFLW) competition is warranted due to its rapid evolution. To address this, predictive models were fit to determine which technical match-play data, including new contextual information, more accurately predict AFLW match outcomes (win/loss, margin), and what are the most important contexts and technical predictors of team performance? Thirteen random forest models were fit, each with greater data contextual interaction including relative to opposition and harder-to-attain match-play variables, field location, and individual player contributions. Models were assessed by prediction performance on match outcome in a holdout sample and variable importance through Mean Decrease in Gini Index. Effective kicks and entries into attacking locations were important in models. Territory gained, contexts of relative performance to the opposition, and locational information around actions improved prediction. This methodology represents the most in-depth analysis of women’s Australian football technical match-play performance to date. Commentary presented surrounded issues of using aggregated datasets, prediction with match-play success as a dependent variable, and that detailed, process-oriented approaches are needed in future to avoid large assumptions

    Winning With Chaos in Association Football:Spatiotemporal Event Distribution Randomness Metric for Team Performance Evaluation

    Get PDF
    Association football (commonly known as football or soccer) in the modern era places a greater emphasis on collaborating and working together as a team instead of relying solely on individual skills to strategize winning performances. The low-scoring and unpredictable nature of association football makes evaluating team performances challenging. Space creation and space utilization have been discussed in the football world lately. Existing literature evaluates this with on and off-ball runs by players for deceiving defenders to create open spaces. However, the contribution of these team ball movements' enhanced randomness or chaotic nature to winning performances has yet to be explored. This work proposes a novel entropy-based time-series performance evaluation metric, EDRan, for quantifying this enhanced random nature by analyzing the spatial distribution of game events at regular intervals. Additionally, an unexplored cumulative ball possession matrix is used to quantify randomness. The correlation between the match winner and spatial event distribution randomness at regular intervals was analyzed. The significance of the proposed metric was demonstrated using a generalized linear model (GLM), which achieved an average accuracy of 80% for match-winning performance classification. The GLM p-values and coefficients revealed statistically significant relationships between the extracted temporal features and match-winning performances. Findings further revealed dispersed, highly random event distribution by winning teams during the early phases of the game, implying attacking behavior, followed by a compact, cautious playing style toward the end, suggesting that the game's first-half performances are more pivotal. Despite the unpredictability of actual scores in association football, the proposed approach effectively captured the differences in performances between stronger and weaker teams with temporal relationships, highlighting its significance as a time-series metric for performance evaluation

    A multi-season machine learning approach to examine the training load and injury relationship in professional soccer

    Get PDF
    OBJECTIVES: The purpose of this study was to use machine learning to examine the relationship between training load and soccer injury with a multi-season dataset from one English Premier League club. METHODS: Participants were 35 male professional soccer players (aged 25.79±3.75 years, range 18–37 years; height 1.80±0.07 m, range 1.63–1.95 m; weight 80.70±6.78 kg, range 66.03–93.70 kg), with data collected from the 2014–2015 season until the 2018–2019 season. A total of 106 training loads variables (40 GPS data, 6 personal information, 14 physical data, 4 psychological data and 14 ACWR, 14 MSWR and 14 EWMA data) were examined in relation to 133 non-contact injuries, with a high imbalance ratio of 0.013. RESULTS: XGBoost and Artificial Neural Network were implemented to train the machine learning models using four and a half seasons’ data, with the developed models subsequently tested on the following half season’s data. During the first four and a half seasons, there were 341 injuries; during the next half season there were 37 injuries. To interpret and visualize the output of each model and the contribution of each feature (i.e., training load) towards the model, we used the Shapley Additive Explanations (SHAP) approach. Of 37 injuries, XGBoost correctly predicted 26 injuries, with recall and precision of 73% and 10% respectively. Artificial Neural Network correctly predicted 28 injuries, with recall and precision of 77% and 13% respectively. In the model using Artificial Neural Network (the relatively more accurate model), last injury area and weight appeared to be the most important features contributing to the prediction of injury. CONCLUSIONS: This was the first study of its kind to use Artificial Neural Network and a multi-season dataset for injury prediction. Our results demonstrate the potential to predict injuries with high recall, thereby identifying most of the injury cases, albeit, due to high class imbalance, precision suffered. This approach to using machine learning provides potentially valuable insights for soccer organizations and practitioners when monitoring load injuries
    corecore