688 research outputs found

    A data mining approach to predict probabilities of football matches

    Get PDF
    Com um crescimento cada vez maior dos volumes apostados em competições desportivas torna-se importante verificar até onde as técnicas de aprendizagem computacional conseguem trazer valor a esta área. É feita uma avaliação da performance de algoritmos estado-da-arte em diversas métricas, incorporado na metodologia CRISP-DM que é percorrida desde a aquisição de dados via web-scraping, passando pela geração e seleção de features. É também explorado o universo de técnicas de ensemble numa tentativa de melhorar os modelos do ponto de vista do bias-variance trade-off, com especial foco nos ensembles de redes neuronais.With the increasing growth of the amount of money invested in sports betting markets it is important to verify how far the machine learning techniques can bring value to this area. A performance evaluation of the state-of-art algorithms is performed and evaluated according to several metrics, incorporated in the CRISP-DM methodology that goes from web-scraping through to generation and selection of features. It is also explored the universe of ensemble techniques in an attempt to improve the models from the point of view of bias-variance trade-off, with a special focus on neural network ensembles

    Using Supervised Learning to Predict English Premier League Match Results From Starting Line-up Player Data

    Get PDF
    Soccer is one of the most popular sports around the world. Many people, whether they are a fan of a soccer team, a player of online soccer games or even the professional coach of a soccer team, will attempt to use some relevant data to predict the result of a match. Many of these kinds of prediction models are built based on data from the match itself, such as the overall number of shots, yellow or red cards, fouls committed, etc. of the home and away teams. However, this research attempted to predict soccer game results (win, draw or loss) based on data from players in the starting line-up during the first 12 weeks of the 2018-2019 season of the English Premier League

    When Moneyball Meets the Beautiful Game: A Predictive Analytics Approach to Exploring Key Drivers for Soccer Player Valuation

    Get PDF
    To measure the market value of a professional soccer (i.e., association football) player is of great interest to soccer clubs. Several gaps emerge from the existing soccer transfer market research. Economics literature only tests the underlying hypotheses between a player’s market value or wage and a few economic factors. Finance literature provides very theoretical pricing frameworks. Sports science literature uncovers numerous pertinent attributes and skills but gives limited insights into valuation practice. The overarching research question of this work is: what are the key drivers of player valuation in the soccer transfer market? To lay the theoretical foundations of player valuation, this work synthesizes the literature in market efficiency and equilibrium conditions, pricing theories and risk premium, and sports science. Predictive analytics is the primary methodology in conjunction with open-source data and exploratory analysis. Several machine learning algorithms are evaluated based on the trade-offs between predictive accuracy and model interpretability. XGBoost, the best model for player valuation, yields the lowest RMSE and the highest adjusted R2. SHAP values identify the most important features in the best model both at a collective level and at an individual level. This work shows a handful of fundamental economic and risk factors have more substantial effect on player valuation than a large number of sports science factors. Within sports science factors, general physiological and psychological attributes appear to be more important than soccer-specific skills. Theoretically, this work proposes a conceptual framework for soccer player valuation that unifies sports business research and sports science research. Empirically, the predictive analytics methodology deepens our understanding of the value drivers of soccer players. Practically, this work enhances transparency and interpretability in the valuation process and could be extended into a player recommender framework for talent scouting. In summary, this work has demonstrated that the application of analytics can improve decision-making efficiency in player acquisition and profitability of soccer clubs

    pi-football: A Bayesian network model for forecasting Association Football match outcomes

    Get PDF
    A Bayesian network is a graphical probabilistic belief network that represents the conditional dependencies among uncertain variables, which can be both objective and subjective. We present a Bayesian network model for forecasting Association Football matches in which the subjective variables represent the factors that are important for prediction but which historical data fails to capture. The model (pi-football) was used to generate forecasts about the outcomes of the English Premier League (EPL) matches during season 2010/11 (but is easily extended to any football league). Forecasts were published online at www.pi-football.com prior to the start of each match. In this paper, we demonstrate that a) using an appropriate measure of forecast accuracy, the subjective information improved the model such that posterior forecasts were on par with bookmakers ' performance; b) using a standard profitability measure with discrepancy levels at ≥ 5%, the model generates profit under maximum, mean, and common bookmakers ’ odds, even allowing for the bookmakers ' built-in profit margin. Hence, compared with other published football forecast models, pi-football not only appears to be exceptionally accurate, but it can also be used to 'beat the bookies'

    Predicting the risk of injury of professional football players with machine learning

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Information Analysis and ManagementSports analytics is quickly changing the way sports are played. With the rise of sensor data and new tracking technologies, data is collected at an unprecedented degree which allows for a plethora of innovative analytics possibilities, with the goal of uncovering hidden trends and developing new knowledge from data sources. This project creates a prediction model which predicts a player’s muscular injury in a professional football team using GPS and self-rating training data, by following a Data Mining methodology and applying machine learning algorithms. Different sampling techniques for imbalanced data are described and used. An analysis of the quality of the results of the different sampling techniques and machine learning algorithms are presented and discussed

    Predictive modelling of football injuries

    Get PDF
    The goal of this thesis is to investigate the potential of predictive modelling for football injuries. This work was conducted in close collaboration with Tottenham Hotspurs FC (THFC), the PGA European tour and the participation of Wolverhampton Wanderers (WW). Three investigations were conducted: 1. Predicting the recovery time of football injuries using the UEFA injury recordings: The UEFA recordings is a common standard for recording injuries in professional football. For this investigation, three datasets of UEFA injury recordings were available: one from THFC, one from WW and one that was constructed by merging both. Poisson, negative binomial and ordinal regression were used to model the recovery time after an injury and assess the significance of various injury-related covariates. Then, different machine learning algorithms (support vector machines, Gaussian processes, neural networks, random forests, naïve Bayes and k-nearest neighbours) were used in order to build a predictive model. The performance of the machine learning models is then improved by using feature selection conducted through correlation-based subset feature selection and random forests. 2. Predicting injuries in professional football using exposure records: The relationship between exposure (in training hours and match hours) in professional football athletes and injury incidence was studied. A common problem in football is understanding how the training schedule of an athlete can affect the chance of him getting injured. The task was to predict the number of days a player can train before he gets injured. The dataset consisted of the exposure records of professional footballers in Tottenham Hotspur Football Club from the season 2012-2013. The problem was approached by a Gaussian process model equipped with a dynamic time warping kernel that allowed the calculation of the similarity of exposure records of different lengths. 3. Predicting intrinsic injury incidence using in-training GPS measurements: A significant percentage of football injuries can be attributed to overtraining and fatigue. GPS data collected during training sessions might provide indicators of fatigue, or might be used to detect very intense training sessions which can lead to overtraining. This research used GPS data gathered during training sessions of the first team of THFC, in order to predict whether an injury would take place during a week. The data consisted of 69 variables in total. Two different binary classification approaches were followed and a variety of algorithms were applied (supervised principal component analysis, random forests, naïve Bayes, support vector machines, Gaussian process, neural networks, ridge logistic regression and k-nearest neighbours). Supervised principal component analysis shows the best results, while it also allows the extraction of components that reduce the total number of variables to 3 or 4 components which correlate with injury incidence. The first investigation contributes the following to the field: • It provides models based on the UEFA injury recordings, a standard used by many clubs, which makes it easier to replicate and apply the results. • It investigates which variables seem to be more highly related to the prediction of recovery after an injury. • It provides a comparison of models for predicting the time to return to play after injury. The second investigation contributes the following to the field: • It provides a model that can be used to predict the time when the first injury of the season will take place. • It provides a kernel that can be utilized by a Gaussian process in order to measure the similarity of training and match schedules, even if the time series involved are of different lengths. The third investigation contributes the following to the field: • It provides a model to predict injury on a given week based on GPS data gathered from training sessions. • It provides components, extracted through supervised principal component analysis, that correlate with injury incidence and can be used to summarize the large number of GPS variables in a parsimonious way

    Bayesian networks for prediction, risk assessment and decision making in an inefficient Association Football gambling market.

    Get PDF
    PhDResearchers have witnessed the great success in deterministic and perfect information domains. Intelligent pruning and evaluation techniques have been proven to be sufficient in providing outstanding intelligent decision making performance. However, processes that model uncertainty and risk for real-life situations have not met the same success. Association Football has been identified as an ideal and exciting application for that matter; it is the world's most popular sport and constitutes the fastest growing gambling market at international level. As a result, summarising the risk and uncertainty when it comes to the outcomes of relevant football match events has been dramatically increased both in importance as well as in challenge. A gambling market is described as being inefficient if there are one or more betting procedures that generate profit, at a consistent rate, as a consequence of exploiting market flaws. This study exhibits evidence of an (intended) inefficient football gambling market and demonstrates how a Bayesian network model can be employed against market odds for the gambler’s benefit. A Bayesian network is a graphical probabilistic model that represents the conditional dependencies among uncertain variables which can be both objective and subjective. We have proposed such a model, which we call pi-football, and used it to generate forecasts for the English Premier League matches during seasons 2010/11 and 2011/12. The proposed subjective variables represent the factors that are important for prediction but which historical data fails to capture, and forecasts were published online at www.pi-football.com prior to the start of each match.For assessing the performance of our model we have considered both profitability and accuracy measures and demonstrate that subjective information improved the forecasting capability of our model significantly. Resulting match forecasts are sufficiently more accurate relative to market odds and thus, the model demonstrates profitable returns at a consistent rateEngineering and Physical Sciences Research Council (EPSRC; Agena Ltd for software support

    Annual Report of Undergraduate Research Fellows, August 2008 to May 2009

    Get PDF
    Annual Report of Undergraduate Research Fellows from August 2008 to May 2009

    USING MACHINE LEARNING TO OPTIMIZE PREDICTIVE MODELS USED FOR BIG DATA ANALYTICS IN VARIOUS SPORTS EVENTS

    Get PDF
    In today’s world, data is growing in huge volume and type day by day. Historical data can hence be leveraged to predict the likelihood of the events which are to occur in the future. This process of using statistical or any other form of data to predict future outcomes is commonly termed as predictive modelling. Predictive modelling is becoming more and more important and is trending because of several reasons. But mainly, it enables businesses or individual users to gain accurate insights and allows to decide suitable actions for a profitable outcome. Machine learning techniques are generally used in order to build these predictive models. Examples of machine learning models ranges from time-series-based regression models which can be used for predicting volume of airline related traffic and linear regression-based models which can be used for predicting fuel efficiency. There are many domains which can gain competitive advantage by using predictive modelling with machine learning. Few of these domains include, but are not limited to, banking and financial services, retail, insurance, fraud detection, stock market analysis, sentimental analysis etc. In this research project, predictive analysis is used for the sports domain. It’s an upcoming domain where machine learning can help make better predictions. There are numerous sports events happening around the globe every day and the data gathered from these events can very well be used for predicting as well as improving the future events. In this project, machine learning with statistics would be used to perform quantitative and predictive analysis of dataset related to soccer. Comparisons of these models to see how effectively the models are is also presented. Also, few big data tools and techniques are used in order to optimize these predictive models and increase their accuracy to over 90%

    Pacific Review Winter 2014

    Get PDF
    https://scholarlycommons.pacific.edu/pacific-review/1009/thumbnail.jp
    • …
    corecore