112 research outputs found

    Sports analytics: maximizing precision in predicting MLB base hits

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNos últimos anos o mundo do desporto alcançou níveis de crescimento nunca antes visto e, este evento, fomentou a necessidade para o crescimento no uso de ferramentas que tragam vantagens para as organizações e os respetivos stakeholders. Como resultado tem se registado um rápido crescimento no uso da análise de dados para vários tópicos relacionados com o desporto que consequentemente origina melhores e rápido julgamentos para os tomadores de decisão. Nesta linha de pensamento, o principal objetivo deste projeto é contruir um modelo preditivo capaz de prever as probabilidades de um jogador da MLB obter um “base hit” num dia com o propósito de ganhar o jogo Beat the Streak e, ao mesmo tempo, providenciar informações valiosas à equipa técnica. A arquitetura que serviu de diretriz a este projeto foi o CRIPS-DM, o qual foi aplicado a uma base de dados construída especificamente para este projeto com dados publicamente acessíveis. Para alcançar os referidos objetivos, foram usados o Excel com meio para recolher e estruturar a base de dados e o Python para os restantes processos com um enfase na biblioteca SKlearn. Os elementos que separam as construções dos modelos finais foram o balanceamento da base de dados, outliers, redução da dimensionalidade, seleção das variáveis e os algoritmos – Logistic Regression, Multi-layer Perceptron, Random Forest e Stochastic Gradient Descent. Os resultados obtidos foram positivos sendo o modelo com a melhor performance um Multi-layer Perceptron que obteve 85% de escolhas certas no set de teste. Este resultado alcançou uma melhoria de 5 pontos percentuais sobre o melhor modelo encontrado durante a pesquisa bibliográfica. Os resultados em questão foram positivos, mas existe margem para melhorar os modelos desenvolvidos ou a criação de outros modelos porque com os resultados obtidos ainda é difícil ganhar o jogo Beat the Streak, o que deixa em aberto a possibilidade para a criação de novos modelos.As the world of sports expanded to never seen levels, so did the necessity for tools which provided material advantages for organizations and other stakeholders. This resulted in an increase on the use of data and analytics for a multitude of sports related topics, which led to more precise and quicker judgements for decision makers related to sports. In this line of though, the main objective of this paper is to build a predictive model capable of predicting what are the odds of a baseball player getting a base hit on a given day, with the intention of both winning the game Beat the Streak and to provide valuable information for the coaching staff. CRISP-DM was the architecture chosen as the main guideline to apply on the dataset, entirely built for this paper, using publicly available data. To achieve these objectives, Excel was used for data collection purposes and Python for the remaining steps with a big emphasis on the SKlearn library. Several models were tested and the main constrains that separate them from each other are balancing, outliers, dimensionality reduction, variable selection and the type of algorithm – Logistic Regression, Multi-layer Perceptron, Random Forest and Stochastic Gradient Descent. The results obtained were positive, in which one of the Multi-layer Perceptron achieved an 85% correct pick ratio on the test set, which is an improvement of 5 percentage points over the best model found during the literature review. Nevertheless, there is undoubtedly room for improvements in the final models and for other models with similar intentions, since the results achieved do not provide a good change of Beating the Streak

    Sports Data Analysis – Application of Sports Data In Athletics

    Get PDF

    Predicting Outcomes of Horse Racing using Machine Learning

    Get PDF
    Machine learning with its vast framework is making its way into every aspect of modern society. The segment of betting sports particularly horse racing calls for the attention from a large spectrum of research community owing to its value to the stakeholders and the amount of money involved. Horse racing prediction is a complex problem as there are a large number of influencing variables. The present study aims to contribute in this domain by training machine learning algorithms for predicting horse racing results or outcomes. For this, data for a whole racing season from 2017 to 2019 of races conducted by Turf Club of India was considered which amounts to over 14,700 races.  Six algorithms namely Logistic Regression, Random Forest, Naive Bayes, and k-Nearest Neighbors) k-NN were used to predict the winning horse for each race. Synthetic Minority Oversampling Technique (SMOTE) technique was applied to the imbalanced horse racing data set and the attributes of the horse race repository were analyzed. The results were compared with other sampling methods to evaluate the relative effectiveness of this method. The proposed framework is able to give an accuracy of 97.6% which is substantially higher when compared to other similar studies. The research can be beneficial to the stakeholders as well as researchers in the same area to do further analysis and experiments

    The Impact of Consumer Perceptions of Tanking on National Basketball Association Attendance

    Get PDF
    This dissertation studies the impact of consumer perceptions of tanking on National Basketball Attendance (NBA) attendance. The prevalence of tanking in the NBA raised concerns that some teams were purposely avoiding winning games in order to improve their draft position. The majority of previous studies on tanking have focused on developing empirical evidence of the existence of tanking in sport. Yet, no study systematically explored the impact of perceived tanking behavior on consumer demand for sport. As tanking teams rarely reveal their tanking strategy to the public, fans may not correctly identify tanking behavior in sport, and thus are likely to rely on their perceptions of tanking to make attendance decisions. The current dissertation employs tanking discussions on the social media platform Twitter along with data mining tools to quantify consumer perceptions of tanking. Econometric models are then utilized to analyze the effect of the perceived tanking behavior on demand for NBA games. The estimation results provide robust evidence that the increasing awareness of tanking for home teams hurts NBA attendance in both the short and long term. This dissertation also reveals that more negative attitudes toward visiting teams’ tanking behavior can undermine consumer interest in attending NBA games. These findings offer important managerial implications on the urgency of restraining tanking behavior as well as the importance of maintaining integrity in sports competitions

    Modeling the Risk of Team Sport Injuries: A Narrative Review of Different Statistical Approaches

    Get PDF
    Injuries are a common occurrence in team sports and can have significant financial, physical and psychological consequences for athletes and their sporting organizations. As such, an abundance of research has attempted to identify factors associated with the risk of injury, which is important when developing injury prevention and risk mitigation strategies. There are a number of methods that can be used to identify injury risk factors. However, difficulty in understanding the nuances between different statistical approaches can lead to incorrect inferences and decisions being made from data. Accordingly, this narrative review aims to (1) outline commonly implemented methods for determining injury risk, (2) highlight the differences between association and prediction as it relates to injury and (3) describe advances in statistical modeling and the current evidence relating to predicting injuries in sport. Based on the points that are discussed throughout this narrative review, both researchers and practitioners alike need to carefully consider the different types of variables that are examined in relation to injury risk and how the analyses pertaining to these different variables are interpreted. There are a number of other important considerations when modeling the risk of injury, such as the method of data transformation, model validation and performance assessment. With these technical considerations in mind, researchers and practitioners should consider shifting their perspective of injury etiology from one of reductionism to one of complexity. Concurrently, research implementing reductionist approaches should be used to inform and implement complex approaches to identifying injury risk. However, the ability to capture large injury numbers is a current limitation of sports injury research and there has been a call to make data available to researchers, so that analyses and results can be replicated and verified. Collaborative efforts such as this will help prevent incorrect inferences being made from spurious data and will assist in developing interventions that are underpinned by sound scientific rationale. Such efforts will be a step in the right direction of improving the ability to identify injury risk, which in turn will help improve risk mitigation and ultimately the prevention of injuries

    The Gettysburg Economic Review, Volume 3, Spring 2009

    Full text link
    corecore