16 research outputs found

    Missing data patterns in runners' careers: do they matter?

    Get PDF
    Predicting the future performance of young runners is an important research issue in experimental sports science and performance analysis. We analyse a data set with annual seasonal best performances of male middle distance runners for a period of 14 years and provide a modelling framework that accounts for both the fact that each runner has typically run in three distance events (800, 1500 and 5000 meters) and the presence of periods of no running activities. We propose a latent class matrix-variate state space model and we empirically demonstrate that accounting for missing data patterns in runners' careers improves the out of sample prediction of their performances over time. In particular, we demonstrate that for this analysis, the missing data patterns provide valuable information for the prediction of runner's performance

    Analisi delle performance sportive con i modelli state space

    No full text
    Lo studio delle performance sportive è un argomento di notevole importanza nelle scienze motorie, in cui il ruolo dell'utilizzo dei dati è sempre stato fondamentale. La stessa valutazione di una gara di uno sportivo, per esempio, viene svolta a partire da misurazioni quantitative delle sue performance, sulla base delle quali vengono stilate poi le classifiche. Se all'inizio, a fronte di questo interesse, vari metodi ed approcci sono stati sviluppati negli anni da chi era direttamente coinvolto nell'ambito, il progresso della tecnologia ha avvicinato a questo campo studiosi e ricercatori di altri ambiti di ricerca. Matematici, ingegneri, informatici e statistici sono coinvolti in vari aspetti di questa disciplina, che li vede partecipi sia nello sviluppo di strumenti tecnologici utili alla raccolta stessa dei dati e al loro utilizzo, che neò rispondere a domande di ricerca con vari livelli di complessità. Lo scopo di questa tesi è quello di fornire strumenti statistici utili per le analisi delle performance sportive, con particolare riferimento all'utilizzo dei modelli state space e all'analisi di serie storiche. La tesi è composta da quattro capitoli: i primi introducono in maniera complessiva gli argomenti trattati; i rimanenti, invece, presentano i principali contributi di questo lavoro. In particolare, il primo capitolo offre una visione generale delle analisi delle performance sportive, ne discute gli obiettivi e gli strumenti utilizzati, e delinea alcune opportunità di ricerca in campo statistico. Il secondo capitolo, invece, presenta una selezione di strumenti e modelli per le analisi di serie storiche. Nel terzo capitolo viene presentato un modello di clustering Bayesiano utile per descrivere le migliori performance annuali di atleti mezzofondisti italiani. Più nel dettaglio, il capitolo propone un modello state space matriciale in cui varie traiettorie multivariate di diversi atleti vengono raggruppate sulla base del trend delle performance e dei pattern di dati mancanti osservati nel campione, come indici della storia e delle attitudini personali degli atleti. L'inferenza è condotta mediante un algoritmo di simulazione nella classe dei metodi Markov Chain Monte Carlo. L'applicazione con dati reali mostra benefici e limitazioni dell'approccio proposto, fornendo indicazioni di quali siano i fattori rilevanti per ottenere performance sportive migliori. Il quarto capitolo descrive un modello per il monitoraggio dello stato di salute durante l'attività sportiva. Il modello proposto unisce la modellazione state space con i modelli per l'identificazione di changepoint al fine di individuare cambi distribuzionali in una sequenza di attività sportive. L'inferenza avviene tramite un algoritmo online di Expectation-Maximization che richiede un'approssimazione delle probabilità di changepoint predette, ottenuta tramite un metodo di approssimazione Monte Carlo sequenziale. Come conseguenza delle assunzioni fatte sul modello, l'algoritmo proposto processa sequenze di serie storiche in un contesto doppiamente online. Mentre i modelli di changepoint identificano cambi tra diverse attività successive, la formulazione state space del modello, unita all'algoritmo proposto, fornisce il beneficio aggiuntivo di stimare la probabilità di changepoint in tempo reale.The study of sports performances is a topic of paramount importance in sports sciences, in which the role of data have been always fundamental. The evaluation of athletes' competition, for example, can be done on the basis of quantitative measurements of their performances, useful for obtaining the subsequent rankings. If in principle, according to this interest, various methods and approaches have been developed by whom was directly involved in the field, the progress in technology has attracted researches from other domains to this topic. Mathematicians, engineers, computer scientists, and statisticians are involved in different aspects of sports science, both in developing technological tools useful in collecting and using data and in answering to research questions of various levels of complexity. The aim of this thesis is to provide statistical tools that can be used in analyzing sports performances, with a particular reference to the employment of state space models and time series analysis. The present thesis is composed of four chapters: the first two provides an overview of the treated topics; the remains chapters presents the main contributions of this work. In particular, the first chapter includes a general discussion of sports performances analysis. The second chapter presents selected tools and models useful in the time series analysis. In the third chapter, a Bayesian clustering model is presented in order to describe the personal best performances of Italian middle distance athletes. In more detail, the chapter provides a state space matrix model in which several multivariate trajectories of different athletes have been grouped on the basis of the trend of their performance and the pattern of missing data observed in the sample, this last considered as indicator of personal history and attitudes of athletes. The inference is conducted through a Markov Chain Monte Carlo simulation algorithm. The application on real data shows benefits and limitations of the proposed approach and it provides indications on which factors are relevant in order to obtain better sports performances. The fourth chapter describes a model for monitoring the health status during sports activities. The inference has been conducted using an online Expectation-Maximization algorithm involving a sequential Monte Carlo approximation of changepoint predicted probabilities. As a byproduct of our model assumptions, the proposed algorithm processes sequence of time series in a doubly-online framework. While changepoint models identify changes between subsequent activities, the state space formulation of the model, together with the proposed algorithm, provides the additional benefit of estimating changepoint probability in real-time

    Mattia Stival and Lorenzo Schiavon’s contribution to the Discussion of ‘Flexible marked spatio-temporal point processes with applications to event sequences from association football’ by Narayanan, Kosmidis and Dellaportas

    No full text
    In association football there exist two types of in-game data: event-sequence data provide qualitative information on the succession of ball-related events in time and space; tracking data report with fine temporal granularity the positions of the ball and every player, in which the ball is just one of many interacting objects. Using event-sequence data, the authors place themselves within a local perspective, with the possible undesirable consequence of missing part of relevant information. To mitigate the impact of this shortcoming, multiple solutions can be accounted for. One option consists in enriching the event-sequence data with additional qualitative knowledge regarding the game situation. If also player tracking data are available, an alternative solution would be merging any observed event in the event-sequence data. When this extra information is missing, a careful model specification is required. The process defined by the authors represents a brilliant answer to this challenge. Since any sequence of ball-related events is partially determined by the player’s locations on the pitch, the observation of a certain sequence carries with it additional implicit information about team positioning. With this model, predictive probability density functions of the occurrence of any marked event can be derived. Hence, one could reconstruct via simulation the distribu- tion of the number of any event combination observable in a limited amount of time. This is allowed by joint modelling the event sequence and the time between subsequent events, with temporal modelling standing as a crucial feature to formulate in-game forecasts, and representing a key difference with respect to other frameworks based on a discrete-time game-states representation. A final remark concerns how model complexity is addressed. It would be interesting to compare the association rule learning method with alternative strategies in which the modelling assumptions or prior distributions directly account for sparsity

    Doubly-online change- point detection for monitoring health status during sport activities

    No full text
    We provide an online framework for analyzing data recorded by smart watches during running activities. In particular, we focus on identifying vari- ations in the behavior of one or more measurements caused by changes in physical condition, such as physical discomfort, periods of prolonged de- training, or even the malfunction of measuring devices. Our framework con- siders data as a sequence of running activities represented by multivariate time series of physical and biometric data. We combine classical changepoint detection models with an unknown number of components with Gaussian state space models to detect distributional changes between a sequence of activities. The model considers multiple sources of dependence due to the se- quential nature of subsequent activities, the autocorrelation structure within each activity, and the contemporaneous dependence between different vari- ables. We provide an online Expectation-Maximization (EM) algorithm in- volving a sequential Monte Carlo (SMC) approximation of changepoint pre- dicted probabilities. As a byproduct of our model assumptions, our proposed approach processes sequences of multivariate time series in a doubly-online framework. While classical changepoint models detect changes between sub- sequent activities, the state space framework coupled with the online EM al- gorithm provides the additional benefit of estimating the real-time probability that a current activity is a changepoint
    corecore