10 research outputs found

    Predicting college basketball match outcomes using machine learning techniques: some results and lessons learned

    Full text link
    Most existing work on predicting NCAAB matches has been developed in a statistical context. Trusting the capabilities of ML techniques, particularly classification learners, to uncover the importance of features and learn their relationships, we evaluated a number of different paradigms on this task. In this paper, we summarize our work, pointing out that attributes seem to be more important than models, and that there seems to be an upper limit to predictive quality

    March madness prediction using machine learning techniques

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceMarch Madness describes the final tournament of the college basketball championship, considered by many as the biggest sporting event in the United States - moving every year tons of dollars in both bets and television. Besides that, there are 60 million Americans who fill out their tournament bracket every year, and anything is more likely than hit all 68 games. After collecting and transforming data from Sports-Reference.com, the experimental part consists of preprocess the data, evaluate the features to consider in the models and train the data. In this study, based on tournament data over the last 20 years, Machine Learning algorithms like Decision Trees Classifier, K-Nearest Neighbors Classifier, Stochastic Gradient Descent Classifier and others were applied to measure the accuracy of the predictions and to be compared with some benchmarks. Despite of the most important variables seemed to be those related to seeds, shooting and the number of participations in the tournament, it was not possible to define exactly which ones should be used in the modeling and all ended up being used. Regarding the results, when training the entire dataset, the accuracy ranges from 65 to 70%, where Support Vector Classification yields the best results. When compared with picking the highest seed, these results are slightly lower. On the other hand, when predicting the Tournament of 2017, the Support Vector Classification and the Multi-Layer Perceptron Classifier reach 85 and 79% of accuracy, respectively. In this sense, they surpass the previous benchmark and the most respected websites and statistics in the field. Given some existing constraints, it is quite possible that these results could be improved and deepened in other ways. Meanwhile, this project can be referenced and serve as a basis for the future work

    Music March Madness: Predicting the Winner of Locura de Marzo

    Get PDF
    Each Spring, thousands of middle and high school students enrolled in Spanish classes vote for their favorite songs in the annual Locura De Marzo competition. This alternative March Madness competition gives us an opportunity to build and test models to predict which songs will win which furthers the Hit Song Science literature. Using decision trees and support vector machine (SVM) models we find similarities with the challenge of predicting the popular NCAA Basketball bracket including the importance of seed and the difficulty in predicting a “perfect” bracke

    Predictive Analytics for College Basketball: Using Logistic Regression for Determining the Outcome of a Game

    Get PDF

    An extended regularized adjusted plus-minus analysis for lineup management in basketball using play-by-play data

    Get PDF
    In this work we analyse basketball play-by-play data in order to evaluate the efficiency of different fiveman lineups employed by teams. Starting from the adjusted plus-minus framework, we present a modelbased strategy for the analysis of the result of partial match outcomes, extending the current literature in two main directions. The first extension replaces the classical response variable (scored points) with a comprehensive score that combines a set of box score statistics. This allows various aspects of the game to be separated. The second extension focuses on entire lineups rather than individual players, using a suitable extended model specification. The model fitting procedure is Bayesian and provides the necessary regularization. An advantage of this approach is the use of posterior distributions to rank players and lineups, providing an effective tool for team managers. For the empirical analysis, we use the 2018/2019 regular season of the Turkish Airlines Euroleague Championship, with play-by-play and box scores for 240 matches, which are made available by the league website. The results of the model fitting can be used for several investigations as, for instance, the comparative analysis of the effects of single players and the estimation of total and synergic effects of lineups monitoring. Moreover, the behaviour of players and lineups during the season, updating the estimation results after each gameday, can represent a rather useful tool

    SUCCESSFUL SHOT LOCATIONS AND SHOT TYPES USED IN NCAA MEN’S DIVISION I BASKETBALL

    Get PDF
    The primary purpose of the current study was to investigate the effect of court location (distance and angle from basket) and shot types used on shot success in NCAA Men’s DI basketball during the 2017-18 season. A secondary purpose was to further expand the analysis based on two additional factors: player position (guard, forward, or center) and team ranking. All statistical analyses were completed in RStudio and three binomial logistic regression analyses were performed to evaluate factors that influence shot success; one for all two and three point shot attempts, one for only two point attempts, and one for only three point attempts. Results indicated that guards are most likely to score as distance increases, when compared to forwards and centers. In addition, jump shots are most likely to be utilized successfully for every one-meter increase in distance, when compared to hook shots, tip shots, lay ups, and dunks. Results also indicated that, for further distances, the probability of shot success increases as angle decreases. The probability of shot success was also shown to be significantly influenced by team rank, with higher ranking teams having higher probabilities of shot success, although the magnitude of this effect was small and not practically relevant

    The Final Four Formula: A Binary Choice Logit Model to Predict the SemiFinalists of the NCAA Division I Men’s Basketball Tournament

    Get PDF
    The NCAA Division I men’s basketball tournament is one of the most popular sporting events in America. This paper dissects the tournament and attempts to accurately predict the four semi-finalists (“the final four”) using a binary choice logit model. The model does better than any current rating system at predicting these four teams. This paper also examines some common issues about predicting college basketball as a whole. Overall, this paper provides a insights for selection committees, participants in office pools, and coaches to help them achieve their own individual goals

    Predictive modelling of football injuries

    Get PDF
    The goal of this thesis is to investigate the potential of predictive modelling for football injuries. This work was conducted in close collaboration with Tottenham Hotspurs FC (THFC), the PGA European tour and the participation of Wolverhampton Wanderers (WW). Three investigations were conducted: 1. Predicting the recovery time of football injuries using the UEFA injury recordings: The UEFA recordings is a common standard for recording injuries in professional football. For this investigation, three datasets of UEFA injury recordings were available: one from THFC, one from WW and one that was constructed by merging both. Poisson, negative binomial and ordinal regression were used to model the recovery time after an injury and assess the significance of various injury-related covariates. Then, different machine learning algorithms (support vector machines, Gaussian processes, neural networks, random forests, naïve Bayes and k-nearest neighbours) were used in order to build a predictive model. The performance of the machine learning models is then improved by using feature selection conducted through correlation-based subset feature selection and random forests. 2. Predicting injuries in professional football using exposure records: The relationship between exposure (in training hours and match hours) in professional football athletes and injury incidence was studied. A common problem in football is understanding how the training schedule of an athlete can affect the chance of him getting injured. The task was to predict the number of days a player can train before he gets injured. The dataset consisted of the exposure records of professional footballers in Tottenham Hotspur Football Club from the season 2012-2013. The problem was approached by a Gaussian process model equipped with a dynamic time warping kernel that allowed the calculation of the similarity of exposure records of different lengths. 3. Predicting intrinsic injury incidence using in-training GPS measurements: A significant percentage of football injuries can be attributed to overtraining and fatigue. GPS data collected during training sessions might provide indicators of fatigue, or might be used to detect very intense training sessions which can lead to overtraining. This research used GPS data gathered during training sessions of the first team of THFC, in order to predict whether an injury would take place during a week. The data consisted of 69 variables in total. Two different binary classification approaches were followed and a variety of algorithms were applied (supervised principal component analysis, random forests, naïve Bayes, support vector machines, Gaussian process, neural networks, ridge logistic regression and k-nearest neighbours). Supervised principal component analysis shows the best results, while it also allows the extraction of components that reduce the total number of variables to 3 or 4 components which correlate with injury incidence. The first investigation contributes the following to the field: • It provides models based on the UEFA injury recordings, a standard used by many clubs, which makes it easier to replicate and apply the results. • It investigates which variables seem to be more highly related to the prediction of recovery after an injury. • It provides a comparison of models for predicting the time to return to play after injury. The second investigation contributes the following to the field: • It provides a model that can be used to predict the time when the first injury of the season will take place. • It provides a kernel that can be utilized by a Gaussian process in order to measure the similarity of training and match schedules, even if the time series involved are of different lengths. The third investigation contributes the following to the field: • It provides a model to predict injury on a given week based on GPS data gathered from training sessions. • It provides components, extracted through supervised principal component analysis, that correlate with injury incidence and can be used to summarize the large number of GPS variables in a parsimonious way

    2013 Oklahoma Research Day Full Program

    Get PDF
    This document contains all abstracts from the 2013 Oklahoma Research Day held at the University of Central Oklahoma
    corecore