Distance construction and clustering of football player performance data

Abstract

I present a new idea to map football players information by using multidimensional scaling and to cluster football players. The actual goal is to define a proper distance measure between players. The data was assembled from whoscored.com. Variables are of the mixed type, containing nominal, ordinal, count and continuous information. In the data pre-processing stage, four different steps are followed through for continuous and count variables: 1) representation (i.e., considerations regarding how the relevant information is most appropriately represented, e.g., relative to minutes played), 2) transformation (football knowledge as well as the skewness of the distribution of some count variables indicates that transformation should be used to decrease the effective distance between higher values compared to the distances between lower values), 3) standardisation (in order to make within-variable variations comparable), and 4) variable weighting including variable selection. In a final phase, all the different types of distance measures are combined by using the principle of the Gower dissimilarity (Gower, 1971). As the second part of this thesis, the aim was to choose a suitable clustering technique and to estimate the best number of clusters for the dissimilarity measurement obtained from football players data set. For this aim, different clustering quality indexes have been introduced, and as first proposed by Hennig (2017), a new concept to calibrate the clustering quality indexes has been presented. In this respect, Hennig (2017) proposed two random clustering algorithms, which generates random clustering points from which standardised clustering quality index values can be calculated and aggregated in an appropriate way. In this thesis, two new additional random clustering algorithms have been proposed and the aggregation of clustering quality indexes has been examined with different types of simulated and real data sets. In the end, this new concept has been applied to the dissimilarity measurement of football players

    Similar works