Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThis project looks to evaluate if football clubs should or should not change their coach
in order to improve their performance in the national league. For this analysis I selected,
three of the most important European football leagues, La Liga (Spain), Serie A (Italy)
and Premier League (England).
The data used in this project was taken from the transfermarkt website, a large football
platform. The data period is from season 2005-06 to season 2019-20 and has information
about individual games results and squad value by player.
The steps before the analysis were a data cleaning and consolidation of the information,
creation of new features as a performance measure and selection of cases of interest for
this analysis based on club and coach profile. Numeric variables were standardized to
be on the same scale and make different seasons comparable. A K-means was applied
to identify clubs according to their investments which has a proportional correlation
with performance.
Finally, a difference in differences analysis was applied to evaluate if a club would
obtain a performance gain if they decided to sack their coach between game twelve and
twenty-six of the season after a poor performance in consideration to squad price.
As a general conclusion, it is possible to consider that on average the clubs in the
treatment group and comparison group recover their performance after a period of
underperforming, but the recovery of the clubs that sack their coach is lower compared
with the clubs that keep them